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PARAMETRIC  AND  NON-PARAMETRIC  SCHEMES  FOR  DISCRETE 
TIME  SIGNAL  DISCRIMINATION 

Chapter  1 
Introduction 

In  this  thesis  we  consider  the  problem  of  discriminating  between  classes  of  discrete  time 
signals.  The  simplest  case  of  discrete  time  signal  discrimination  is  the  binary  discrimination 
problem.  In  this  case,  a  random  discrete  time  signal  is  observed  and  must  be  classified 
into  one  of  two  categories.  Typically  the  discrimination  method  is  designed  to  optimize  some 
measure  of  performance;  this  measure  of  performance  is  usually  related  to  the  probability 
of  error  and/or  the  number  of  samples  used  to  make  a  decision.  The  binary  discrimination 
problem  is  faced  often  in  radar  applications,  where  the  receiver  must  decide  whether  the 
observed  signal  is  from  a  target  of  interest  or  a  decoy.  Throughout  this  thesis,  we  present 
results  on  signal  discrimination  for  arbitrary  classes  of  data  without  assuming  what  the  data 
represent  or  from  what  structure/implementation  they  are  obtained.  However,  we  shall  often 
try  to  relate  our  results  to  the  problem  of  binary  discrimination  faced  by  a  radar  receiver. 

We  refer  to  the  two  classes  of  signals  from  which  the  observed  data  originate  as  hy¬ 
potheses  H\  and  Ho-  The  observed  data  sequence  is  denoted  as  Under  hypothesis 

Hi,  i  =  0, 1.  (Le.  hypothesis  Hx  is  true,)  the  observed  data  sequence  has  the  n-dlmensional 
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probability  density  function  (pdf)  fi(zx  ,z2,...,  zn).  More  specifically,  we  consider 


H\  :  {Z}Li  has  pdf  fx(zuz2 . zn)  =  fi(z) 

(1.1) 

Ho  :  {Z}"=1  has  pdf  /0(z1(  z2,...,2n)=  /o(z) 
where  z  represents  the  n-tuple  (zx,z2,. .  ■ ,  zn).  Note  that  we  do  not  constrain  the  data  to  be 
independent;  various  assumptions  of  the  correlation  between  samples  will  be  made  later  in 
this  thesis. 

If  the  n-dlmensional  pdfs  under  each  hypothesis  were  known  by  the  discriminator 
designer,  a  likelihood  ratio  test  could  be  implemented.  The  likelihood  ratio  test  is  of  the  form 


<f(z)  = 


i, 

fo(z) 

o, 

/o(z) 


(1.2) 


where  q  is  a  constant  to  be  determined.  Hypothesis  is  chosen  by  the  discriminator  when 
d( z)  =  t.  i  =  0, 1.  Likelihood  ratio  tests  are  well  known  and  optimal  in  the  Bayes,  Neyman- 
Pearson,  and  minlmax  senses!  1];  the  choice  of  T)  depends  upon  which  criterion  the  designer 
chooses  to  optimize.  However,  we  assume  that  the  n-dlmensional  pdfs  are  not  known. 

It  is  further  assumed  that  the  data  sequence  is  strictly  stationary  that  is,  the  statistics 
do  not  iry  with  time; 


/»(2i  t  z2i  •  •  •  i  Zn)  =  fi{zk+ii  2k+2>  •  •  •  i  zk+n)  i  =  0,l;  k  arbitrary.  (1.3) 


As  mentioned  above,  we  do  not  constrain  the  data  to  represent  any  specific  signal. 
However,  for  the  radar  problem,  some  possibilities  are  samples  of  the  envelope  detector 
output,  matched  filter  output,  or  even  phase  data.  Figure  1 . 1  illustrates  a  scheme  for  dis¬ 
criminating  between  radar  targets  by  using  envelope  samples. 

The  radar  uses  a  simple  pulse  modulated  waveform.  The  pulse  modulator  block  gen¬ 
erates  the  pulsed  waveform.  This  in  turn  is  fed  Into  the  transmitter  to  be  modulated  to  radio 
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frequency  (RF).  This  signal  is  then  input  to  the  duplexer,  which  isolates  the  transmitter  and 
receiver  during  transmission  and  reception.  During  the  transmission,  the  receiver  is  effec¬ 
tively  disconnected  from  the  antenna,  while  during  reception  the  transmitter  is  disconnected 
from  the  antenna.  The  pulsed  radio  frequency  signal  is  then  radiated  through  the  antenna. 
If  the  antenna  is  pointing  at  an  object,  some  portion  of  the  signal  may  be  reflected  towards 
the  antenna.  By  this  time,  the  duplexer  has  switched  the  antenna  to  the  receiver  circuitry. 
The  incoming  waveform  is  amplified  by  a  radio  frequency  amplifier  and  then  mixed  to  an 
intermediate  frequency  (IF)-  This  signal  is  then  passed  through  the  matched  filter  of  the  IF 
amplifier  to  maximize  the  signal  to  noise  ratio  (SNR).  The  output  of  this  block  is  then  enve¬ 
lope  detected.  A  portion  of  the  envelope  signal  is  routed  through  the  video  amplifier  and  into 
a  display:  either  an  A-scope  or  a  PPI  (plan  position  indicator.) 

The  other  portions  of  the  envelope  signal  are  routed  to  the  ADT  (automatic  detection 
and  tracking)  circuitry  and  to  the  discriminator  circuitry.  The  ADT  determines  if  targets  are 
present,  initiates  track  on  new  targets,  and  determines  how  to  set  the  pointing  angles  of 
the  antenna.  The  ADT  therefore  communicates  with  the  display  circuitry  and  the  antenna 
control  circiutry.  The  ADT  also  notifies  the  discriminator  cirdutry  that  a  target  has  been 
detected.  The  discriminator  then  begins  its  tests  by  obtaining  samples  of  the  envelope  signal 
in  the  time  intervals  corresponding  to  the  target's  position.  When  the  discriminator  makes  a 
decision,  it  can  instruct  the  ADT  to  continue  tracking  the  target  (if  it  is  a  target  of  interest)  or 
to  drop  the  target  from  track  (if  it  is  a  decoy  or  a  target  of  little  Interest.)  Figure  1 .2  illustrates 
a  method  that  a  discriminator  may  possibly  utilize  in  obtaining  data  samples.  The  figure  is 
a  diagram  of  five  pulse  repetition  intervals  (PRIs.)  The  rectangular  pulses  represent  the 
pulse  waveform  to  be  modulated  and  transmitted.  The  random  signal  between  the  pulses 
represents  the  envelope  signal.  The  data  samples  Z0,  Z\ .  Z2  >  •  •  •  arc  obtained  by  sampling 
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Figure  1.2:  Extracting  Data  Samples  from  the  Radar  Return 

the  envelope  signal  within  the  range  gate  corresponding  to  the  object  being  discriminated. 
In  Figure  1.2.  only  one  sample  per  target  per  range  bln  Is  obtained. 

The  above  implementation  Is  Just  one  example  of  how  a  discriminator  can  be  Imple¬ 
mented  In  a  practical  system.  However,  structure  of  the  discriminator  block  was  not  detailed 
In  the  above  example.  There  are  several  approaches  to  designing  the  discriminator  block. 
Figure  1.3  Illustrates  some  possible  approaches  to  designing  a  discriminator.  The  first  ap¬ 
proach  is  to  model  the  physics  generating  the  data  under  each  hypothesis.  Then  the  pdfs  of 
the  data  under  each  hypothesis  may  be  assumed  or  derived,  thus  allowing  a  discriminator 
to  be  Implemented.  Another  approach  is  to  collect  actual  data,  estimate  pdfs  of  the  data 
under  each  hypothesis,  and  then  implement  a  discriminator.  The  last  method  is  to  collect 
data,  train  a  discriminator  with  a  supervised  learning  algorithm  via  simulation,  and  then 
Implement  the  discriminator.  The  first  approach  may  be  very  difficult  and  mathematically 
intractable.  The  other  two  approaches  are  more  easily  adaptable  to  any  problem  since  they 
assume  no  model  of  the  physics  which  generate  the  data  sequence. 

In  this  thesis,  we  consider  all  three  of  the  above  approaches  to  designing  a  discrim¬ 
inator.  In  Chapter  2.  it  is  assumed  that  marginal  and  bivariate  pdfs  of  the  data  under 
each  hypothesis  are  known  to  the  discriminator  designer.  Optimal  memoryless  quantizer 
discriminators  are  designed  using  the  marginal  and  bivariate  pdfs  (actually  cumulative  dis- 
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Figure  1.3:  Possible  Approaches  to  Designing  a  Discriminator 


trlbuUon  functions,  denoted  as  cdfs).  The  discriminators  use  a  test  staUsUc  of  the  form 
Tj  =  X)i=i  Q(Zi)'  where  Q(x)  Is  a  quantization  function  chosen  to  maximize  a  suitable 
performance  measure.  The  approach  used  to  design  the  discriminator  corresponds  to  the 
first  approach  of  Figure  1.3  and  Is  parametric  since  pdfs  are  assumed  unavailable. 

In  Chapter  3  it  Is  assumed  that  the  pdfs  are  not  known.  The  approach  used  to  design 
a  discriminator  In  this  chapter  corresponds  to  the  second  approach  In  Figure  1.3.  Non- 
parametric  estimates  of  the  marginal  and  bivariate  pdfs  of  the  data  under  each  hypothesis 
are  formed  and  fed  Into  the  expressions  for  the  optimal  memoryless  quantizer  discriminators 
derived  in  Chapter  2.  The  estimates  are  formed  by  collecting  data  prior  to  the  design  of  the 
discriminator,  the  data  are  fed  into  kernel  density  estimators.  The  data  for  estimation  are 
denoted  as 

Cmj  (1-4) 

where  i  =  0, 1  denotes  hypothesis  ff0  or  H\  respectively,  where  m  =  0, 1, . . . ,  M  -  1  denotes 
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the  sample  path  number,  and  where  j  =  0, 1,2, . . . ,  N  —  1  denotes  the  sample  number.  Thus 
under  each  hypothesis,  we  have  M  sample  paths  (Le.  sequences)  which  are  N  data  samples 
long.  It  Is  assumed  that  the  M  sample  paths  are  independent  of  each  other.  Throughout 
this  thesis  we  refer  to  the  these  data  sequences  as  the  training  data. 

In  Chapter  4,  the  final  approach  to  designing  a  discriminator  is  considered.  The 
discriminators  from  Chapters  2  and  3  required  only  marginal  and  bivariate  pdfs  due  to 
their  memoryless  property.  However,  it  is  suspected  that  memory  improves  performance  for 
correlated  data.  In  Chapter  4,  discriminators  which  have  a  test  statistic  of  the  form  T}  = 
!Hi=K  7(£/f— l+i*  Zk-2+i,  •  •  • ,  Z, )  are  considered.  To  find  the  optimal  nonlinearity  7(  ),  pdfs 
of  higher  order  than  the  bivariate  pdfs  would  have  to  be  assumed  or  estimated;  the  estimation 
of  the  higher -order  pdfs  may  not  be  practical  and  the  assumption  or  derivation  of  such  pdfs 
may  be  mathematically  intractable.  To  avoid  the  difficulty  in  obtaining  these  pdfs,  multiple- 
layer  perceptron  neural  networks  are  trained  to  act  as  the  nonlinearity  7(1,  ,*3, . . . ,  z k) 
using  the  back  propagation  algorithm. 

It  Is  likely  that  once  a  discriminator  is  Implemented,  it  will  encounter  data  from  pdfs 
different  from  those  with  which  it  was  designed.  Obviously,  the  designer  wants  the  discrim¬ 
inator  to  be  robust  to  these  conditions.  In  Chapter  5,  some  simulation  results  are  presented 
on  the  mismatch  of  the  pdfs.  These  results  give  some  indication  of  the  robustness  charac¬ 
teristics  of  the  discriminators  presented  in  this  thesis. 

Note  that  all  discriminator  models  in  this  thesis  make  decisions  on  the  basis  of  se¬ 
quential  tests.  These  tests,  upon  obtaining  a  new  data  sample,  either  classify  the  sequence 
or  decide  to  obtain  a  new  data  sample.  Sequential  tests  are  used  in  situations  requiring  fast 


and  accurate  decisions. 


Discriminators  using  similar  structures  to  the  ones  in  this  thesis  can  be  implemented  to 
form  decisions  on  the  basis  of  fixed  length  blocks  of  data.  However  fixed  sample  size  tests 
are  beyond  the  scope  of  this  thesis. 


Chapter  2 


Memoryless  Quantizer 
Discriminators 


In  this  chapter,  we  derive  optimal  memoryless  quantizer  discriminators  for  use  In  our  bi¬ 
nary  discrimination  problem.  The  data  sequence  Is  assumed  stationary  and  m-dependent. 
m-dependent  means  that,  under  Ht,  Zk  and  Zt  are  correlatled  for  \k  -  l\  <  m,  and  are 
independent  for  |fc  —  l\  >  m,.  These  discriminators  operate  on  the  data  sequence  {Z,}^i 
by  computing  the  test  statistic  T„  =  Q(Zi).  Q{x)  Is  a  quantizer  function  chosen  to 

maximize  a  suitable  performance  measure. 

The  test  may  be  performed  In  either  a  block  or  sequential  fashion.  Both  tests  are 
based  on  the  fact  that,  as  n  tends  towards  infinity,  T„  converges  to  a  Gaussian  distribution 
with  mean  n/r,  and  variance  no?,  for  i  =  0,1.  corresponding  to  hypotheses  H\  and  Hq 
respectively,  m  is  defined  by 

MQ)  =  e,{Q(Z1)},  i  =  0,1  (2.i) 

where  Ey  denotes  expectation  under  hypothesis  Hx.  a\  Is  defined  by 

o ,2(Q)=  lim  n_1  Var,(rn),  :' =  0, 1  (2.2) 

n  —  oo 
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and  is  given  by 


aUQ)  =  Vari(Q(Zl))  +  2f^Covi{Q{Z1),Q(Zj+i)}.  (2.3) 

j=i 

Vari  and  Cov{  denote  variance  and  covariance  under  hypothesis  Ht,  and  to,  is  the  m- 
dependence  length  under  hypothesis  Ht.  [21  gives  a  proof  using  a  central  limit  theorem 
which  shows  that  Tn  is  asymptotically  Gaussian  under  hypothesis  Hx  with  mean  rc/i,  and 
variance  n<xf .  provided  that  erf  >  0. 

Optimal  quantization  has  been  studied  by  others  (see  [31  and  [4])  for  the  related  hy¬ 
pothesis  testing  problem  concerning  the  detection  of  weak  signals  in  additive  noise.  These 
employed  block  tests,  where  T„  was  compared  to  a  decision  threshold.  Quantizer  functions 
were  chosen  to  maximize  the  well  known  asymptotic  relative  efficiency  (ARE.)  Given  two 
detectors,  (pi  and  y>2  •  the  ARE  of  detector  <p\  relative  to  detector  <£2  is  defined  as 


ARE(1,2)  =  lim  e(a,0,n ).  (2.4) 

n-»oo,S-»0 

where  e(a,  6,  n)  is  the  relative  number  of  samples  <£>i  required  to  achieve  the  same  proba¬ 
bility  of  detection  that  achieves  for  sample  size  n  when  both  ipi  and  have  false  alarm 
probability  a  and  signal  strength  6.  Under  certain  regularity  conditions  (see  (21.)  the  ARE 
for  two  quantizer  detectors  has  the  form 


where  the  quantity  rj{Q)  Is  the  efficacy  of  the  detector  9  using  Q,  and  is  given  by 

(/e/)2 
”(<?)  -  -TJwT 


(2.5) 


(2.6) 


Here.  /  Is  the  derivative  of  the  noise  marginal  probability  density  function  with  respect  to 
signal  strength  6.  The  optimal  quantizer  is  the  quantizer  which  optimizes  the  ARE:  this 
quantizer  also  maximizes  the  efficacy. 
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[4]  derived  the  optimal  quantizer  for  the  weak  signal  detection  case  with  Independent 
noise,  while  [2]  derived  the  optimal  quantizer  for  the  m  -dependent  noise  case.  For  the  dis¬ 
crimination  problem.  (5)  has  derived  optimal  nonlinearities  which  maximize  signal  to  noise 
type  performance  measures  of  the  form 

(^i(g)-^o(g))2  ,2  7, 

where  v  6  [0, 1],  These  detectors  operated  In  a  block  fashion  forming  a  test  statistic  T„  = 
S"=i  ff(Zi)  which  was  compared  to  a  decision  threshold. 

Recently.  |6]  derived  optimal  nonlinearities  for  use  in  a  sequential  discrimination 
scheme.  These  sequential  discriminators  operated  by  forming  a  test  statistic  of  the  form 
Tn  =  9(Zi).  Another  test  statistic  was  formed,  either  as  a  linear  expression  of  Tn: 

S„  =  ATn  +  Bn.  or  as  a  quadratic  expression  of  Tn:  Sn  =  AT„  +  BTn  +  Cn  +  D.  5„  was 
then  compared  to  two  decision  thresholds;  if  the  upper  threshold.  6,  was  exceeded.  H\  was 
declared.  If  5„  dropped  below  the  lower  threshold,  a.  H o  was  declared.  Otherwise,  another 
sample  Zn+ i  was  obtained.  7n+l  and  Sn+ 1  were  computed,  and  the  threshold  tests  were 
repeated.  This  continued  until  one  of  the  thresholds  was  crossed.  The  nonlinearities  were 
chosen  to  minimize  the  average  sample  size  required  to  terminate  the  test.  This  criterion 
is  Important  for  the  class  of  problems  where  a  fast  decision  is  needed  as  well  as  a  reliable 
decision  (Le.  small  error  probabilities.)  {6)  used  the  well  known  Wald  thresholds  [7]  of 
b  =  In  ((1  —  0)  fa)  and  a  =  In  (0/(1  -  a)).  The  corresponding  optimal  values  of  A  and  B 
were  A  =  2(ni  -  no)  /  (o\  +  <r£)  and  B  =  -2(n\  -  no)  (n\<*o  +  Mo^i)  /  (<?\  +  0o)2- 

(6)  showed  that  the  optimal  nonlinearity  solved  a  nonlinear  integral  equation:  this  was 
the  result  of  a  more  complex  performance  measure  than  (2.7).  However  161  also  considered 
a  suboptimal  nonlinearity  which  solved  a  linear  integral  equation;  this  nonlinearity  was 


S*(9) 
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the  restilt  of  a  performance  measure  with  the  form  of  (2.7).  The  suboptimal  nonlinearity 
performed  nearly  as  well  as  the  optimal  nonlinearity  and  was  much  easier  to  solve  for  because 
of  the  linear  integral  equation.  Since  [51  and  [6]  have  shown  performance  measures  with 
the  form  of  (2.7)  which  result  in  good  block  and  sequential  discriminators,  we  only  consider 
quantizers  that  maximize  a  performance  measure  of  the  form  given  In  (2.7).  Being  consistent 
with  the  subscript  notation  In  [5],  we  state  our  prc  n  as  finding  a  quantizer  that  maximizes 
the  performance  measure 


5  (m  _  (Ml(Q)-Mo(W 

(^(Q)  +  (1-^(Q))- 


(2.8) 


Now  we  define  the  notation  used  in  this  chapter.  The  quantizer  function  Q{x)  is 
defined  as  Q  =  (q,t).  where  q  =  (qi,q2,  ■  ■  ■  ,q\f)T  Is  the  quantization  level  vector  and 
t  =  (to,  h ,  •  •  • ,  t m)  is  an  ordered  breakpoint  vector.  These  define  Q(x)  by 


Q(x)  =  qk  when  xefh.hh],  k  =  (2.9) 


With  this  definition  of  Q{x),  we  sometimes  also  use  the  notation  S+{Q)  =  S4(q). 


2.1  Derivation  the  Mean  and  Variance  for  a  Quantizer  Function 


To  maximize  the  performance  measure  S4(q),  the  mean  n,  and  variance  <7*  of  the  quantizer 
function  must  be  evaluated  under  both  hypotheses  ( i  =  0,1).  The  mean  of  the  quantizer  is 
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given  by 


F,(x)  Is  the  cumulative  distribution  function  under  hypothesis  H,  and  Pri(A)  is  the  prob¬ 
ability  of  event  A  occurring  under  hypothesis  Hi .  Note  that,  in  the  above  integrals,  we  have 
assumed  that  fi(x)  =  0.  for  i  <  0:  this  is  the  result  of  the  envelope  detector  output  of  the 
radar  system  being  always  non-negative.  The  variance  of  the  quantizer  Q(Z)  is  evaluated 
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as 


a2  [Q]  =  Vari  [Q{Zx)\  +  2  [Q(Z1),Q(Zj+i)] 

j=i 

=  £,[g2(z1)]-/x,2[g(Zi)] 

+  2j^Ei{[Q(Zx)  -  Vi  [Q(Zx)})  [Q(Zj+i)  -  Vi  [Q(Zim 
j= i 

=  Ei[Q2(Zx)\-Vi2lQ(Z  i)l 
+  2^2{Ei  [Q(Zi)Q(Zi+l)]  -  Vi 2  [Q(Z i)]} 

;=i 

=  £,  [Q2{Zx])  -  (2m,  +  l)vi 2  [Q(Z,)] 

m, 

+  2^£,[g(z1)g(2;+i)]. 

j=i 

The  power  Ex  [g2(Zi )]  Is  evaluated  by 


ft  oo 

Ex  [Q\Zx)\  =  Jo  Q2(z)fi(z)dz 

f+oo  M 

■to  k=l 

r  +  oo  M 

■'O  tel 

M  ,+oo 

M  -t* 

=  £  /  tiMz)d2 

k=i Jtk-' 

M 

=  ^qlPn{z  €  (4-i,4]} 


*=i 


M 


fc=l 

=  qTF,q 


(2.13) 


(2.14) 
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where  the  matrix  F,  is  defined  as 


F,  i  diag{F,(tx)  -  F^Ffa)  -  Ffa) . F,(tM)  -  F{{tM. x)}. 


(2.15) 


The  squared  term  in  (2.13)  can  be  rewritten  as 


M 


M 


M,2  [Q(Zi))  =  £>  [*(!*)-  Fiitk.x)} [FiiU)  -  W-i)] 


k=l 
Af  M 


1=1 


=  E  E  I^(‘fc)  -  ^(u-i)]  [Fi(t,)  -  F,(t|_x)] 


fc=l 1=1 

-  „T 


=  q-1  (AFi)(AFi)  q. 


(2.16) 


Finally,  the  last  term  in  (2.13)  is  given  by 


2^Ei[Q(Zl)Q(Zj+l)]  =  2^Ei 

j= 1  J=1 

m, 

=  2  £> 


M 


M 


Y  9fcA«*-i  ,«*i  (^i )  E  ) 

*:=i  i=i 

M  M 

EE 


j=i  U=i  i=i 
m,  Af  Af 

=  2YYYqkq,Ei  [A‘*-i.<*i(2i)Aii-».i<i(^j+o] 

j=i fc=i j=i 

T71,  Af  Af 

=  2^^^9fcgiFri{Zi  €  (4-i,4]  AND  Zj+x  G  (fi_i,</]} 

>=i *=i  1=1 


=  qTP,q, 


(2.17) 


where  the  matrix  P,  is  defined  by  its  elements 


m, 

[Pi]fc/  =2^Fr<{2,  e(tk.utk)  AND  Z>+i  €  (2-18) 

;=i 


So  now  the  variance  erf  [Q(Z)]  can  be  obtained  by  combining  equations  (2.14)  through  (2.18) 
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to  yield 


[ Q(Z )]  =  qT( AF,)q  -  (2m<  +  l)qT(AF<)(AFi)Tq  +  qP<q 

=  qT  [(AFi)  -  (2 m<  +  l)(AF,)(AFi)T  +  P,]  q  (2.19) 

=  qT[Pi  +  F,]q. 

The  matrix  P,  is  defined  as 


Pi  =  Pi  -  (2 m*  +  l)(AFi)(AFi)T. 


(2.20) 


2.2  Evaluation  of  the  Performance  Measure  for  a  Quantizer  Func¬ 


tion 


Using  the  expressions  for  /i,  [Q(Z)\  and  <r}  [Q(Z)]  from  the  previous  section  the  value  of  the 
performance  measure  for  a  quantizer  function  is  given  by 


c  (n\  £  bl  ~  Mo]2 

4  +  (1  -  v)o\ 


qr(AFi)  -  qr(AF0)]:: 


i/qr[Pi  -f-  Fi]q  +  (1  -  f')qT[Po  +  F0]q 

[qT[(AFi)-(AF0)]]2 
qr[^(Pi  +  Fi]  +  (l-^)[Po  +  Fo]]q 

2 

[[(AFi)-(AF0))rq] 
qT[^(Pi+F1]  +  (l-t/)(Po  +  F0]]q' 


(2.21) 
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2.3  Evaluation  of  the  Optimal  Quantizer  Function  for  Specified 
Breakpoints 

A  necessary  condition  for  the  performance  measure  to  be  maximized  ts  for  the  gradient  with 


(2.22) 


respect  to  the  level  vector  q  to  be  zero.  So  we  need  to  evaluate  the  gradient  of  equation 
(2.21).  This  is  given  by 

V,i.W)-V,  [.  j^[(A>,)-(Aft)1)’  • 

qT  [„[Pl  +  Fx]  +  (1  -  m)[Po  +  Fo]J  q 

2  [qr[(AFQ  -  (AFo))]  [(AFi)  -  (AFo)] 
qT[v[Pi  +  Fi]  +  (1  -  i/)[P0  +  F0]]q 

2  [qT((AFi)  -  (AFp)]]2  [*[Pi  +  FQ  +  (1  -  u)[P0  +  F0]]q 
[qT[^(Pi  +  Fi)  +  (1  -  ^)(Po  +  F0]]q]2 

where  Vq  denotes  gradient  with  respect  to  the  level  vector  q.  Now  define  q°  as  the  vector 
which  maximizes  S4(Q). 

(2.23) 


q°  =  arg{  max  54(C?)}. 

q€HM 


The  necessary  condition  is 


V,J4 


(«)] 


=  0. 


(2.24) 


q=q* 


If 


u[Pi  +  Fi]  +  (1  -  i/)[Po  +  Fo]j  is  positive  definite  and  qT[(AFi )  -  (AFo)]  >  0  (i.e. 
A»i  >  Mo),  then  equation  (2.24)  reduces  to 

[(AFi)  -  (AFo)]  -  A(q°)[^[Pl  +  F,]  +  (1  -  i/)[P0  +  F0]]q°  =  0  (2.25) 


where  the  multiplier  A(q)  is  defined  as 


A(q)  = 


qr((AFi)-(AFo)] 


t/[Pi  +  Fi]  +  (1  —  v)[P0  +  Fo] 


(2.26) 
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So  we  ha  ve 


[j/[Px  +  F;]  +  (1  -  *)[Po  +  Fo]]  ''[(AFi)  -  (AFo)] 

W) 


q°  = 


(2.27) 


as  an  expression  for  the  optimal  quantization  function  for  fixed  breakpoints.  S4  (Q)  remains 
unchanged  by  scaling  Q  ( Le .  S4(Q)  =  S4 (aQ),  where  a  is  a  constant).  This  implies  that 
A(q°)  does  not  affect  the  value  of  ^(Q).  so  all  solutions  to  (2.25)  are  equivalent.  One 
particular  solution  is  where  A(q°)  =  1: 


q°  = 


i/[Pi  +  Fi]  +  (1  -  i/)[P0  +  Fo]]  "'[(AFi)  -  (AFo)]- 


(2.28) 


This  is  equivalent  to 


[i/[Pi  +  Fi]  +  (1  -  *)[Po  +  F0]]q°  -  [(AFi)  -  (AFo)]  =  0.  (2.29) 

The  matrix  |vFi  +  (1  -  «/) Fo  has  the  form 

^Fi  +  (1  -  ^)F0]  =  diagli/lF^tj)  -  Fi(f0)]  +  (1  -  ^)[Fo(fi)  -  F0(t0)] , 
u{F1(t2)  -Ft(t x)]  +  (1  -  ^)(F0(f2)  -  F0(t,)] ,  •  •  • , 

«/[Fi(fM)  -  F,(tM-i)]  +  (1  -  ^)[Fo(fM)  -  Fo(Im-i)]}.  (2.30) 

All  terms  of  |r>Fi  +  (1  -  t')Fo]  are  positive,  since  its  terms  are  probabilities,  so  its  inverse 
I^Fi  +  (1  —  f)Fo]  exists.  This  allows  equation  (2.29)  to  be  written  as 

|^Fi  +  (1  -  t'JFo]  [^[Pi  +  Pi]  +  (1  -  ^[Po  +  Fo]jq° 

-  [i/F,  +  (1  -  v)F0]  _1((AFi )  -  (AF0)]  =  0.  (2.31) 


This  can  be  simplified  to 

I  +  K4]  q°  -  [1/F1  +  (1  -  «/)Fo]  '*[( AF, )  -  (AFo)]  =  0 


(2.32) 
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where  I  is  the  (MxM)  identity  matrix  and  where  we  define 


K<  =  [1/F1  +  (1  -  i/)F0] _1  [«/Pi  +  (1  -  «/)P0 


(2.33) 


The  components  of  K<  are  given  by 


N*.,  =  Fi 


[j/Pi  +(l-i/)P0]t/ 


[v  ~  ^l^fc-i)]  +  (1  -  l')[-fb(*Jt)  -  Fo(ffc-i)]] 

We  can  define  the  vector  b<  as 


b<  =  [t/Fi  +  (1  -  i/)Fo]  '[(AFi)  -  (AF0)] 

_ Fi(ti)  -  Fi(tp)  -  F0(ti)  -f  Fp(to) _ 

-  -Fi(*o)]  +  (1  -  t/)[[-Fb(^i)  -  Fo(to)] 
_ F\{h)  -  F\{h)  -  Fofa)  +  ■F’o(fi) _ 

v[Fi(t2)  -  ^i(^i)]  +  (1  -  v)  [[Fo(t2)  -  F0(t  i)J 

_ Fi(tM)  -  Fi(tM-i)  -  Fq Hm)  +  £M£m-i) 

v  (A(fAf)  -  +  (1  -  ^)  [[Fo(tAf)  -  Fo(tM-i)]  J 

So  we  can  rewrite  equation  (2.29)  as  |l  +  K<  j  q°  -  b<  =  0. 


(2.34) 


(2.35) 


2.4  Evaluation  of  the  Optimal  Performance  Measure 


In  this  section,  the  optimal  performance  measure  for  fixed  breakpoints  is  derived.  This  value 
is  the  maximum  a  value  a  quantizer  with  the  given  breakpoint  vector  t  can  achieve.  We  start 
with  the  performance  measure 


S4(Q)  =  S<(q(t)) 


[qT[(AF1)-(AF0)]]: 


q=q.  qJ 


i/JPi  +  Fi]  +  (1  -  i/)[P0  +  Fq]  q 


(2.36) 


q=q 


Now  the  expression  for  the  optimal  quantizer  levels  for  fixed  breakpoints. 


q°  = 


^Pi  +  Fij  +  (1  -  i/)(Po  +  Fol]  ~‘((AF,)  -  (AFo)]  (2.37) 
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Is  substituted  Into  the  above  expression  to  yield 


[(AFi)  -  (AFo)]T 

t/[Pi  +  F,]  +  (1  -  t/)[Po  +  Fo)] 

-i  l2 

[(AFi) -(AFo)] 

[(AFO  ~  (AFo)]r[t/[Pi  +  Ft]  +  (1  -  «/)[Po  +  Fo] 

-'[(AFi) -(AFo)] 

=  [(AFi )  -  ( AF0)]r  +  Fi]  +  (1  -  ^)[P0  +  Fo]] "' [( AFi )  -  ( AFo)]. 

(2.38) 


2.5  Sufficiency  of  the  Solution  (2.28) 


The  solution  (2.28)  has  been  showed  to  be  a  necessary  condition  for  maximizing  the  perfor¬ 
mance  measure  54(q).  In  this  section,  the  Schwartz  Inequality  is  used  to  show  that  (2.28) 
is  also  a  sufficient  condition  for  maximizing 


9  (a\  =  Mq)-Mq))2 

4  j/<r?(q)  +  (l-t/)<rg(q)' 


For  simplification  purposes  define 


C  ~ 


and 


"(Pi  +  Fi]  +  (1  -  ^)[Po  +  Fo]J 


v  =  [(AFi)  —  (AF0)]. 


By  substituting  these  into  the  expression  for  S4(q)  we  obtain 

fqrvl2 

5^  =  ^T 


liTvl!] 

qrCq 

U(q#)J 

_  [qrwrq°]2 

qTCq(vrq0]2  . 


(2.39) 


(2.40) 


(2.41) 


(2.42) 
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Now  by  the  Schwartz  Inequality,  which  for  two  vectors  x  and  z  Implies  that  (xTz)2  < 
xTxzT z.  we  obtain 


^(q)  < 


qTwrq(q0)rwrq° 
qrCqvrq°v7’q° 


_  qTwTq(q0)Twrq° 
qrCqvTq°vrq0 

_  qrwTq(q°)Tv 
qTCq(q°)Tv 


qTwTq(q°)Tv 

qTCq°qTv. 


(2.43) 


Now  by  substituting  the  expression 


q°  =  [y[Pi  +  Ft]  +  (1  -  i/)[P0  +  Fo]]  _I[(AFi)  -  (AFo)]  =  C"lv  (2.44) 


Into  the  denominator  of  the  above  expression,  we  obtain 


?  <  qrwTq(q°)rv 

^(q)<  qTCC-iyqTv 

-  qrwrq(q°)rv 

qTvqrv 

_  qTwTq(q°)Tv 
qrqvrv 


(2.45) 


=  (q°)Tv  =  S4(q°). 

Thus  we  have  now  shown  that  the  expression  for  optimal  quantizer  satisfies  the  necessary 
and  sufficient  conditions  for  optimality. 


2.6  Evaluation  of  Quantizer  with  Optimal  Levels  and  Breakpoints 

Here  we  derive  the  quantizer  function  with  optimal  levels  and  breakpoints  that  maximizes 
the  performance  measure  S4(q).  Specifically,  we  are  maximizing  the  function  S4(q°tt )).  We 
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need  to  evaluated  the  gradient  of  the  performance  measure  with  respect  to  the  breakpoint 
vector. 


=  Wk  [(AFl }  "  ( AFo)]r[^[Pi  +  Fi]  +  (1  -  ,)[Po  +  Fo]]  _1[(AFi )  -  (AFo)J 

=  ^[(AF1)-(AFo)]7']  [KP1+Fi]  +  (l-t/)[P0  +  Fo]]~1((AFi)-(AFo)] 

+  [(AF1)-(AF0)]7'  [^[«/[Pi  +  F1]  +  (l-^[P0  +  Fo]]‘1j  [(AF»)  -  (AF0)] 

+  [(AF,)  -  (AFo)]T|^[Pi  +  F,]  +  (1  -  i/)[P0  Hr  Fo]]  f^[(AF,)  -  (AF0)] 

(2.46) 

We  use  the  fact  for  Invertible  matrices  that  =  —A^A*1  and  equaUon  ( 2 .46 )  to  obtain 

^-[(AF,)  -  (AFo)]Tl  q" 

* 

-(q°)T  J^[Pi+F,]  +  (l-^)[Po  +  F0]]  q° 

+(q°)T  ^[(AF,)-(AF0)]  •  (2.47) 

This  further  reduces  to 

2(q°)T  ~[(AF1)-(AF0)]j 

-(q0)T  ^[^Pi  +  F,]  +  (l-^)(Po  +  Fo]]  q° 

=  (q°)T  2  J-[(AF,)  -  (AFo)l  -  +  Fi]  +  (1  -  ,)[Po  +  Fo]]]  q0  ■ 

(2.48) 

So  a  necessary  condition  for  the  vector  t  to  maximize  the  performance  measure  is 

0  =  (q°)T  2  ~[(AF,)-(AF0)] 
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for  k  =  1,2,3 -  1 


(2.51) 


2.7  Numerical  Results 


In  this  section,  we  evaluate  the  performance  of  the  memoryless  quantizer  discriminators 
via  computer  simulation.  Although  the  optimal  quantization  functions  may  be  computed 
for  any  m-dependent  processes  for  which  the  marginal  and  bivariate  distributions  of  the 
data  are  known  under  both  hypotheses,  we  consider  only  the  case  typical  to  radar  sys¬ 
tems:  p -mixing  data  from  observations  of  the  radar  return  envelope,  p- mixing  Implies  that 
Cov{Zk,  Zk+n}  <  pn-  where  pn  — *  0  as  n  — ►  oo.  We  shall  assume  that  the  data  are 
samples  of  the  radar  return  envelope,  which  Is  either  from  a  target  (hypothesis  H\)  or  a 
decoy  (hypothesis  Ho-)  Note  that  the  radar  has  already  detected  the  object  (Le.  the  target 
or  decoy),  but  now  "iust  decide  whether  the  return  Is  from  a  target  or  decoy.  Note  that  in 
our  problems  we  neglect  the  possibility  of  detecting  clutter  or  other  objects.  We  define  the 
probability  of  false  alarm  as  the  probability  that  the  discriminator  declares  a  decoy  to  be  the 
target.  The  probability  of  miss  Is  the  probability  that  a  target  Is  declared  a  decoy.  Thus  the 
probability  of  detection  Is  the  probability  that  the  discriminator  declares  a  target  a  target. 

We  consider  two  discrimination  cases  (refer  to  Table  2.1).  Under  Case  1,  the  target’s 
envelope  samples  have  marginal  pdfs  which  are  lognormal,  while  the  decoy's  marginal  pdfs 
are  Rayleigh.  The  observations  under  each  hypothesis  have  matched  means  and  powers. 
For  Case  2.  both  hypotheses  have  Rayleigh  marginal  pdfs.  However,  under  Case  2.  a  3dB 
[H i  vs  Hq)  difference  In  power  exists  between  the  two  hypotheses.  The  observations  are 
assumed  stationary  and  p-mixlng.  Appendix  C  is  a  summary  of  the  necessary  marginal  and 
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Pi 

ttfrrr 

r 

l  .  '  :;■$&  A  X  v.  ■  55:  &§§$  ?  :&•  •‘•5 

SOM 

nBaf» 

^3 

Case  1 

Lognormal 

0 

0 

0.130290. 

vs  Rayleigh 

0.013029 

Case  2 

Rayleigh 

3 

0.130290. 

vs  Rayleigh 

0.013029 

Table  2.1:  Discrimination  Cases 


bivariate  pdfs  for  the  lognormal  and  Rayleigh  processes. 

The  Rayleigh  processes  are  generated  by  undertying  Gaussian  processes  (Le.  the 
inphase  and  quadrature  components.)  We  denote  the  envelope  observations  as  {Z,}-Sr 
The  Rayleigh  envelope  process  is  generated  by 

Zi^yfxJ+Y?,  *  =  1,2,3,...  (2.52) 

where  and  {K,}^  are  mutually  independent  Gaussian  stationary  p-mixing  pro¬ 

cesses.  This  implies  that  {Z,}^  is  also  stationary  and  p- mixing.  The  underlying  Gaussians 
are  generated  by 

Xi  =  pXi _!  +  y/\  -P*  Vi 

Yi  =  pY, _ ,  +  y/T^pWi,  for  I  =  2, 3, . . .  (2.53) 

with 

Xx  =  aVx 

Yx  =  aWx  (2.54) 

where  {V*}^  and  { .  are  mutually  Independent  sequences  of  l.i.d.  (independent 
and  Identically  distributed)  zero  mean/unit  variance  Gaussian  random  variables,  a  is  the 
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standard  deviation  of  the  underlying  Gaussians,  while  p  is  the  correlation  coefficient  for 


adjacent  samples.  Thus  the  underlying  Gaussians  are  stationary  p- mixing  processes  with 
correlation  coefficient  p  for  adjacent  samples. 

The  correlation  coefficient  p  is  related  to  the  decorrelation  time  r  (see  Table  2. 1)  in  the 
following  manner:  r  is  defined  to  be  the  time  it  takes  for  the  correlation  coefficient  between 
the  first  sample  and  another  sample  to  decrease  by  a  factor  of  e-1 .  In  our  simulations,  we 
assume  that  they  are  uncorrelated,  when  the  correlation  coefficient  between  the  first  and 
j-th  radar  sample  drops  below  0. 1 .  Since  the  underlying  processes  are  Gaussian,  they  will 
also  be  Independent.  Thus  we  can  assume  m-dependence  and  define  mo  and  m\  as  the 
number  of  samples  under  H\  and  Ho  respectively  it  takes  the  correlation  to  drop  to  below 
0.1.  respectively. 

For  very  large  targets,  the  radar  return  envelope  samples  are  often  approximated  by 
a  lognormal  process.  Our  lognormal  process  is  simulated  by  exponentiating  an  underlying 
Gaussian  process: 

Zi  =  exp(X<  +  M),  t  =  1,2,3,...  (2.55) 

where  is  generated  in  the  same  manner  as  equations  (2.53)  and  (2.54).  Unlike  the 
Rayleigh  processes  which  have  underlying  Gaussians  with  zero  mean,  the  underlying  Gaus¬ 
sians  for  the  lognormal  process  may  have  a  mean  p. 

To  generate  the  quantizer  functions,  the  marginal  cdfs  for  each  hypothesis  are  re¬ 
quired.  To  compute  the  matrides  Pi  and  Po.  the  sum  ofbivarlate  cdfs  over  the  m-dependence 
interval  must  be  computed  for  each  hypothesis.  Specifically,  the  sums  i^1,J+1)(x,  y) 

must  be  computed  for  t  =  0, 1  corresponding  to  Hi  and  Ho .  where  y)  is  the  Joint 

cdf  for  samples  Z\  and  Zj+i  and  m,  is  the  m-dependence  length  for  hypothesis  H,.  The 
decorrelation  times  listed  in  Table  2.1  imply  that  the  m-dependence  lengths  are  300  and 


30  for  H\  and  Ho,  respectively.  The  Rayleigh  and  lognormal  marginal  and  sum  of  bivariate 
pdfs  for  both  Case  1  and  Case  2  are  evaluated  at  a  discrete  grid  of  evenly  spaced  points. 
These  points  are  chosen  to  lie  in  the  support  of  the  marginal  density  -  that  is  the  maximum 
and  minimum  sample  values  are  computed  so  that  the  probability  that  a  sample  exceeds  the 
maximum  value  of  the  support  or  falls  below  the  minimum  value  of  the  support  is  0.00005. 
301  grid  points  are  used  over  the  support. 

For  each  case,  three  classes  of  quantization  functions  are  generated.  All  quantization 
functions  are  chosen  to  maximize  the  performance  measure 

Si(Q)  ~  (Mi  —  Mo  1 2/[°ri  +  0q]-  The  first  class  of  quantizers  have  uniform  breakpoints  and 
optimal  levels.  The  second  class  of  quantizers  have  optimal  breakpoints  and  optimal  levels. 
Finally,  the  third  class  of  quantizers  were  obtained  by  quantizing  a  continuous  nonlinearity. 
The  continuous  nonlinearity  is  quantized  by 

{9{t  o),  ifx<to 

[<7(<0  +  <7(*i+i)]/2,  If  <  x  <  *t+i.  *  =  0, 1, ...,  M  -  \  (2.56) 

ff(fjVf)  i  if  Z  ^  t\ f 

where  i,  are  the  breakpoints  and  where  g(x)  is  the  continuous  nonlinearity  which  maximizes 
the  performance  measure  S$(Q)  (see  (6]).  The  quantizer  functions  with  uniform  breakpoints 
and  optimal  levels  are  computed  via  equation  (2.28).  The  quantizer  functions  with  optimal 
levels  and  optimal  breakpoints  are  computed  using  equation  (2.28)  and  a  gradient  search 
technique  over  varying  breakpoints.  Appendix  A  supplies  the  some  of  the  required  deriva¬ 
tives  needed  to  compute  the  derivative  of  the  performance  measure  for  a  gradient  technique. 
However,  In  the  actual  computations,  our  simulations  used  a  finite  difference  gradient  tech¬ 
nique. 


Quantization  functions  from  the  various  classes  are  computed  for  various  number 
of  levels.  Tables  2.2  and  2.3  summarizes  the  quantization  functions  computed.  For  each 
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|  Quantization! 

1  Levels  | 

Msa 

WSBBM 

2 

0.0000716237 

0.0000761859 

0.0000814555 

4 

0.0000435368 

0.0026132100 

0.0003927096 

8 

0.0000047076 

0.0004527053 

0.0102431728 

16 

0.0008984112 

0.0019421706 

0.0110042309 

32 

0.0071742339 

0.0092516430 

0.0111641670 

64 

0.0101809436 

0.0109332995 

128 

0.0112417946 

0.0113493744 

Table  2.2:  Values  of  S3  for  Case  1  Quantizers 


quantization  function  listed  In  Tables  2.2  and  2.3,  the  corresponding  performance  measure 
is  also  listed. 

Figure  2. 1  is  a  graph  of  the  performance  measure  versus  the  number  of  quantization 
levels  for  each  quantization  class.  As  expected,  as  the  number  of  levels  Increases  the  per¬ 
formance  measure  also  increases.  Also,  the  performance  measure  saturates  as  the  number 
of  quantization  levels  becomes  very  large.  The  results  In  Figure  2. 1  are  intuitively  pleasing: 
for  a  fixed  number  of  quantization  levels  the  quantizer  with  optimal  breakpoints  and  optimal 
levels  has  a  greater  performance  measure  than  the  quantizer  with  uniform  breakpoint  and 
optimal  levels,  which  has  a  performance  measure  greater  than  the  quantized  continuous 
nonlinearity.  This  result  Is  expected,  since  the  quantized  continuous  nonlinearity  with  M- 
levels  and  uniform  breakpoints  is  a  subclass  of  the  M-level  quantizer  functions  with  uniform 
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Table  2.3:  Values  of  53  for  Case  2  Quantizers 


breakpoints  and  quantizers  with  M-levels  and  uniform  breakpoints  are  subclass  of  quantiz¬ 
ers  with  M- levels. 

Typical  quantization  functions  are  shown  in  Figures  2.3  to  2. 10.  Figure  2.3  is  the  128- 
level  uniform  quantizer  for  Case  1.  The  8-level  uniform  quantizer  and  the  8-level  quantized 
continuous  nonlinearity  do  not  have  the  abrupt  changes  for  small  and  large  z  values  that  the 
128-level  quantizer  has.  But  the  optimal  8-level  quantizer  comes  close  to  the  shape  of  the 
128-level  quantizer.  The  differences  for  the  quantizers  for  Case  2  are  also  similar.  Note  that 
the  general  shape  of  the  quantized  nonlinearity  in  Figure  2. 10  seems  to  be  different  from  the 
other  quantizers  for  Case  2.  But  note  the  the  general  shape  of  the  quantizer  is  the  same  - 
it  differs  only  by  a  scalar  constant.  (Note  a  quantizer  may  be  scaled  without  changing  the 
performance.) 
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The  discriminator  structure  Is  depicted  In  Figure  2.1 1.  A  maximum  number  of  sam¬ 
ples  per  test  criterion  was  added  to  the  sequential  test  for  practicality.  For  Case  1  discrim¬ 
inators,  the  maximum  number  of  samples  permitted  was  2000.  For  Case  2.  the  maximum 
number  of  samples  permitted  was  4000.  Each  discriminator  was  tested  using  random  data 
sequences.  Also,  each  case  was  evaluated  with  the  desired  error  probabilities  a  =  (3  =  10~2 
and  a  =  /3  =  10  "3.  When  the  discriminators  were  evaluated  with  a  =  (3  =  10-2  as  the 
desired  error  probabilities,  1000  random  sample  paths  from  each  hypothesis  were  utilized. 
For  the  discriminators  designed  for  a  =  (3  =  10~3,  10000  random  sample  paths  from  each 
hypothesis  were  utilized. 

Figures  2.12  through  2.15  are  examples  of  simulated  paths  from  each  hypothesis. 
Tables  2.4  through  2.7  summarize  the  results  from  the  simulations  for  the  quantizer  dis¬ 
criminators.  Listed  for  each  discriminator  Is  the  probability  of  miss,  probability  of  detection, 
expected  number  of  samples  to  make  a  decision,  and  the  performance  measure.  Examining 
the  results  one  can  see  that  generally,  as  the  number  of  quantization  levels  increase,  the 
performance  of  the  discriminator  Improves. 

Examining  the  results  from  Case  1  we  see  that  the  minimum  number  of  quantization 
levels  for  a  uniform  quantization  function  to  result  In  good  performance  was  32.  The  quanti¬ 
zation  function  with  32  levels  designed  for  Pj  =  Pm  =  10~2  had  Py  =0.003,  Pd=0.991.  and 
an  average  sample  number  of  516.  The  quantization  functions  with  less  levels  had  Pj*  1. 
The  quantizer  function  with  optimal  breakpoints  and  levels  required  only  8  quantization 
levels  to  result  in  reasonable  performance,  the  quantized  continuous  nonlinearity  required 
32  levels  to  yield  reasonable  performance.  For  the  Case  1  quantizer  discriminators  with 
desired  P/  =  Pm  =  10”3,  error  probabilities  were  slightly  less  than  those  of  the  quantizer 
discriminators  with  Pj  =  Pm  =  10~2.  but  the  average  sample  numbers  increased;  this  is 
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Figure  2.1:  Performance  Measures  for  Case  1  Quantizers 
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Figure  2.2:  Performance  Measures  for  Case  2  Quantizers 
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expected  since  the  decision  thresholds  move  farther  apart  for  smaller  desired  error  proba¬ 
bilities.  For  Case  2.  the  minimum  number  of  quantization  levels  for  good  performance  is  4 
for  both  optimal  and  uniform  quantization  functions. 
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Figure  2.5:  8-Level  Uniform 


Figure  2.6:  8-Level 
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S3  1 

0.003 


0.996 


0.976 


0.991 


0.985 


0.988 


1997 


1998 


1986 


1573 


516 


527 


537 


0.0000761859 


0.0002613210 


0.0004527053 


0.0019421706 


0.0092516428 


0.0109332995 


0.0113493742 


1 


0.025 


0.998 


0.995 


0.984 


0.987 


0.0000814555 


0.0003927096 


0.0102431725 


0.0110042309 


0.0111641665 


0.007 

1595 

0.0000716237 

0 

2000 

0.0000435368 

0.191 

2000 

0.0000047076 

0.965 

1928 

0.0008984112 

0.972 

617 

0.0071742339 

0.977 

473 

0.0101809439 

0.983 

488 

0.0112417950 

Table  2.4:  Reaulta  for  Case  1  Quantizers  for  desired  Pj  =  Pm  =  10“ 2 
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Table  2.5:  Results  for  Case  1  Quantizers  for  desired  Pj  =  Pm  =  10~3 
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0.003 
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0.996 
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3142 


2420 


2375 


2336 


2331 


2338 


2323 


0.0009184310 


0.0029171822 


0.0029905121 


0.0029982539 


0.0029995114 


0.0029998184 


0.0029998949 


0.0029278097 


0.0029428344 


0.0029954809 


0.0029988189 


0.0029996179 


0.0009132413 


0.0028830940 


0.0029869056 


0.0029979990 


0.0029994954 


0.0029998174 


0.0029998948 


Table  2.8:  Result*  for  Case  2  Quantizers  for  desired  Pr  =  Pm  —  10 
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Uniform 
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0.0029869056 


0.0029979990 


0.0029994954 


0.0029998174 


0.0029998948 


Table  2.7:  Results  for  Case  2  Quantizers  for  desired  Pt  =  Pm  =  10~3 
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Sample  Number 


Figure  2.12:  Semple  Path  from  Ho 


Sample  Number 


Figure  2.13:  Semple  Path  from  H i 
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Chapter  3 


Estimation  and 
Discrimination 


In  the  previous  chapter  we  developed  quantization  functions  which  optimized  the  perfor¬ 
mance  measure  S4  defined  in  equation  (2.7).  As  mentioned  in  that  chapter,  the  marginal 
cdfs  of  the  data  under  each  hypothesis  were  required  to  solve  for  the  quantization  functions. 
Also  required  were  the  sums  Fj'1,J+l\x,  y),  where  F^1'^+1\x,y)  was  the  joint  cdf  of 

the  data  for  samples  Z\  and  Zj+i  under  hypothesis  i.  and  m,  was  the  m-dependence  length 
under  //,.  For  the  continuous  nonlinearities  of  [5}[6],  pdfs  rather  than  cdfs  were  required. 

The  results  presented  in  the  previous  chapter  were  obtained  by  using  the  actual  cdfs 
of  the  various  discrimination  cases.  However,  in  this  chapter,  it  is  assumed  that  the  cdfs 
of  the  data  are  not  known.  This  is  a  more  realistic  problem,  since  in  many  engineering 
problems  the  distributions  of  the  data  are  not  available.  Therefore,  in  this  chapter  the  pdfs 
of  the  data  will  be  estimated  from  the  training  data  introduced  in  Chapter  1,  and  via  nu¬ 
merical  integration  techniques  the  cdfs  will  be  obtained.  Then  quantization  functions  will  be 
computed  and  implemented  in  simulated  discriminators  for  evaluation.  Thus  the  feasibility 
of  estimation  and  discrimination  techniques  with  memoryless  quantizer  discriminators  is 
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addressed  for  the  discrimination  cases  of  the  previous  chapter. 


3.1  Kernel  Density  Estimators 


For  the  estimation  of  our  pdfs  we  utilize  kernel  density  estimators.  The  Idea  behind  these 
estimators  Is  that  each  observation  Xk  is  replaced  by  a  function  of  X^.  Then  the  func¬ 
tions  are  summed  to  yield  the  estimate  of  the  density  f(x).  The  kernel  function  produces  a 
smoothing  effect  and.  if  the  kernel  satisfies  certain  constraints,  the  estimate  will  also  have 
desirable  properties.  For  our  application,  the  main  advantage  of  the  kernel  density  estimator 
over  a  histogram  method  is  the  smoothing  characteristic.  The  kernel  density  estimators  are 
introduced  in  the  following  paragraph. 

Given  the  data  observations.  Xi ,  X2,  ■■  ■ ,  Xn.  It  is  desired  to  estimate  the  marginal 
pdf  of  the  data  /(x).  It  is  assumed  that  the  process  is  stationary.  The  kernel 

density  estimate,  denoted  /(x;  n),  where  n  represents  the  number  of  observations  used  by 
the  estimator.  Is  defined  as 


/(*;*) 


(3.1) 


The  function  K(-)  is  called  the  kernel  function  and  hn  is  usually  referred  to  as  the  window 
width  or  bandwidth  parameter. 

Under  certain  conditions  the  kernel  estimate  has  been  shown  to  be  asymptotically 
unbiased  and  strongly  consistent.  Asymptotically  unbiased  means  that 


[/(*;«)]  =  /(*) 


(3.2) 
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and  strongly  consistent  means  that 

lim  /(x;n)  =  /(x).  (3.3) 

n— *oo 

These  two  characteristics  are  desireable  for  an  estimator  since  they  imply  that  more  obser¬ 
vations  improves  the  estimator's  accuracy. 

(8]  shows  that.  If  X\ ,  12 , . . . ,  x„  are  Independent  and  identically  distributed,  and  if 
(  1)  K(-)  is  a  density,  that  is  J ^  K(x)dx  =  1  and  K(x )  >  0,  Vx. 

(  2)  lim^oo  |x|Jf(x)  =  0 
(  3)  supx  K(x )  <  00 
(  4)  limn-.oo  =  0 
(  5)  lim„_00  nh„  =  00 
(  6)  exp(— an/i„)  <  00,  Va  >  0 

then 

[/(*;«)]  */(*) 

and 

lim  /'(x;n)  =  /(x).  (3.4) 

n— *oo 

For  various  conditions,  the  kernel  density  estimators  have  also  been  shown  to  be 
asymptotically  unbiased  and  consistent  in  the  quadratic  mean  sense  for  asymptotically  ln- 
dependent/ un  cone  la  ted  data  (see  (9j).  Quadratic  mean  consistent  means  that 

^/(x;n)  -  f(x)j  =  0.  (3.5) 

One  case  of  asymptotic  independence  used  in  (9)  that  is  of  interest  to  our  problem  is 
strong-mixing.  Strong-mixing  is  now  defined.  Consider  a  continuous  time  random  process 
X(t).  Let  -  a(X(t),  a  <  t  <  6)  denote  the  <r-algebra  of  events  in  T  generated  by  the 
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random  variables  {X(t),  a  <  t  <  b).  -oo  <  a  <  6  <  oo.  The  stationary  process  X(t)  is 
strong-mixing,  if  for  r  >  0. 

sup  |P[i4J?]  -  P[/1]P[5]|  =  a(r) 

where 

lim  a(r)  =  0.  (3*®) 

r-*oo 

a(r)  characterizes  the  mixing  rate  and  is  referred  to  as  the  mixing  coefficient.  The  above 
definition  basically  states  that  two  non-intersecting  events  A  and  B,  which  are  asymptoti¬ 
cally  separated,  are  asymptotically  independent.  We  assume  that  the  data  Xj ,  X-i , . . . ,  Xn 
are  observations  of  the  process  X(t)  obtained  by  uniform  sampling. 

(9]  shows  that,  if 

(  1)  K(-)  is  a  density,  that  is  K(x)dx  =  1  and  K(x)  >  0,  Vx 
(  2)  limr_oo  K(x)  =  0 
(  3)  supr  K(x)  <  oo 
(  4)  limn— oo  ^ 

(  5)  lilTlfi— +oo  nhfi  —  oo 
(  6)  /0oo[a(r)],,dr  <  oo.  for  0  <  q  <  £ 

then 

Jim^  [/(x;n)j  =  f(x) 


and 

2 

E  (/(x;  n)  -  /(x))  =  0.  (3-7) 

This  result  is  useful  to  our  problem  in  Chapter  2.  where  we  assumed  that  our  process  was 
m-dependent  (Le.  X*  and  X(  have  known  con-elation  for  \k  -  l\  <  m.  while  X*  and  X/ 


are  Independent  for  jA:  —  /j  >  m.)  Since  an  m -dependent  process  satisfies  (3.6).  it  is  also  a 
strong-mixing  process.  Therefore  the  results  of  (91  are  useful  for  our  problem. 

For  the  bivariate  kernel  density  estimators,  vector  observations  of  the  form  X  = 
(X1,*2)1,  are  required.  Given  the  observations,  Xi ,  x2 , .  • . ,  x„.  the  kernel  density  estimate 
of  the  bivariate  pdf  /(x)  is  obtained  by 


/(x;n)  = 


(3.8) 


For  independent  identically  distributed  vectors,  the  estimator  of  (3.8)  is  also  unbiased 
and  strongly  consistent  (8).  That  is.  If 
(  1)  K(-)  is  a  density  on  R2 
(  2)  lim||x||_00  ||x||2X(x)  =  0 
(  3)  supx€RJ  K(x)  <  oo 
(  4)  limn^oo  hn  =  0 
(  5)  limn_00  nhn  =  oo 
(  6)  exp(-anhn)  <  oo,  Va  >  0 


[/(x;»)]  =  /(x) 

and 

lim  /(x;n)=/(x).  (3.9) 

n-»oo 


3.2  Implementation  of  Kernel  Density  Estimators 


To  estimate  our  pdfs,  we  utilize  the  training  data.  Denote  the  estimates  of  the  marginals 
as  fi(x;  n)  and  fo(x-,n).  for  Hi  and  Ho.  respectively.  Also  denote  the  estimates  of  the 
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Note  that,  In  equations  (3.10)  and  (3.11).  we  average  over  the  M  independent  sample 
paths  of  the  training  data.  For  the  blvariates,  we  utilize  pairs  of  observations,  Q  k  and  C/./t+j- 
which  are  j  samples  apart,  to  estimate  /J1,j+1*(x,y;n)  under  hypothesis  Hi . 

Since  the  estimators  cannot  practically  be  Implemented  to  estimate  continuous  func¬ 
tions,  we  implement  them  to  estimate  the  pdfs  at  a  discrete  set  of  grldpoints.  Denote  these 
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grldpolnts  as  i0,zi, . . . ,  ig-i  •  where  G  is  the  total  number  of  points.  So.  using  (3.10)  and 
(3.11)  we  form  the  set  of  estimates 


h(xi;N),  fo(xi] iV),  fort  =  0, 1 


(3.15) 


and 


/J1,i+1)(*  i,xi;N-j),  f^':+l,(xhxr,N-j),  for  i,l  =  0, 1,. . .  ,G  -  1. 


r(  1.J+D 


(3.16) 


Using  equations  (3.15)  and  (3.16)  the  estimators  can  easily  be  Implemented  in  a 
digital  computer  simulation.  Some  computers  are  now  available  with  vector  processing  ca¬ 
pabilities.  which  greatly  decreases  processing  time.  Equations  (3.15)  and  (3.16)  can  be 
easily  vectorized  as 


M*o;N) 

L/i(*(G-u;A0J 


M- 1 


y/2*NMhN 


^  (  ^0,0  +  ^0,1  +  •  •  ■  + 


+ 


^i,o  +  ^1,1  +  • .  •  +  ^i,(N- 


i) 


+  ... 


(3.17) 
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and 


fi1,j+l){xo,xi\N  -j)  ■ 

f\',j+l\xo,X2\N  -  j) 

ffJ+1)(x0,x{G.u;N-j) 
fl1'j+l)(xuxl  ;N-j) 
fj1J+l)(xi,x2;N-j) 

ff^lxuX^N-j) 

/|lj+1)(x(G-1),x1;iV-j) 

fl1,j+1\x{G-i),x2-,N  -j) 

./i1’J+1)(X(C?_1),X(<3_i);^  -;') 


y/2^(N  “ 


M  — 1  -i 

X]  ([xo,o  +  Xo,i  +  •••  +  Xo,(tf-j-i)J 


+  [x'i.o  +  X*1,1  +  •  •  •  +  Xi.t/v-j-n]  +  •  •  • 


[x(M-l),0  +  X(M-D,1  +  •••  +  X(M-1),(/V-J-1)]) 


(3.18) 


where  we  have  defined 


{--2^7  (Xo  -  <■-.*)  j 

f  1 

<1  = 

1 

'*}■ 
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(3.19) 


and 


“p  (0°  -  ^.')2  +  (I0 "  c-.('+i))2) } 

“p  {-*£-;  ((xo  -  c--<)2  +  ( 11  -  C«+i>)2) } 

«p  {-2^  ((*<>  -  c.<)2  +  -  ^.(/+i>)2) } 

((Xl  - c-.')2  +  0° "  c-.<'+;>)2) } 

«P  {"^  ((*»  ~  C-.')2  +  (X1  "  C-.('+i))2)  } 

Xm,i  =  :  ■  (3-20) 

“P  ((*»  ~  C--')2  +  (*<«-»>  "  Cm,(i+J))2)  } 

{-27^— ;  ((*(c-d  -  a,i)2  +  (*o  -  ci,.(/+i))2) } 

((*<g-u  -&,,)*  +  (*1  -Cl(,+i))2)} 

-«p  {-27^-;  ((*«?-»>  -  Cm)2  +  (x(G-i)  -  C,(J+j))2) } . 

3.3  Numerical  Results 


In  this  section,  the  performance  of  memoryless  quantizer  discriminators  based  upon  esti¬ 
mated  pdfs  was  evaluated  via  computer  simulation.  Equations  (3.17)  through  (3.20)  were 
implemented  in  a  Convex  210  mini-super  computer  capable  of  vector  processing.  The  train¬ 
ing  data  introduced  in  Chapter  1  was  fed  into  the  simulations  to  obtain  estimates  of  the 
necessary  pdfs.  These  pdfs  were  then  integrated  via  a  Simpson's  integration  to  result  In  the 
cdfs  required  to  derive  the  quantization  functions.  Next  we  consider  the  consistency  of  the 
estimators. 
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Figure  3.1:  Mean  Squared  Error  for  an  Estimated  Marginal  Density  Function 

Using  the  estimator  simulations,  the  consistency  of  the  marginal  estimators  were 
checked  for  a  Rayleigh  density.  Equations  (2.52)  through  (2.54)  were  utilized  to  generate 
correlated  data  sequences  with  a  Rayleigh  marginal  density.  These  data  were  then  fed  into 
estimators.  The  densities  were  evaluated  at  65  gridpoints  over  the  Interval  (0.02.12).  The 
lower  limit,  0.02,  was  placed  just  below  the  minimum  observed  sample,  and  the  upper  limit. 
12,  was  set  just  above  the  maximum  observed  sample.  The  constant  c  in  equation  (3.14) 
was  set  to  0.1.  Figure  3.1  shows  the  mean  squared  error  as  a  function  of  the  number  of 


53 


Figure  3.2:  Nominal  and  Estimated  Marginal  Probability  Density  Functions 

samples  used  by  the  estimator.  Notice  that,  as  the  number  of  samples  Increases,  the  mean 
squared  error  decreases  (apparently  exponentially  towards  zero).  This  result  supports  the 
notion  of  quadratic  mean  consistency  of  the  marginal  density  estimator  for  correlated  data. 
Due  to  computer  processing  limitations,  the  consistency  of  bivariate  pdf  estimates  could  not 
be  checked.  Figure  3.2  depicts  the  nominal  density,  the  estimated  density  for  1 ,000  samples, 
and  the  estimated  density  for  100,000  samples.  Clearly  the  estimate  for  100.000  samples  is 
closer  to  the  nominal  density  than  the  estimate  for  1 .000  samples. 


54 


With  some  confidence  that  the  estimators  produce  reasonable  estimates  of  the  pdfs, 
we  now  consider  our  discrimination  cases.  Using  the  training  data  Introduced  in  Chapter  1 
and  the  estimator  simulations,  estimates  of  the  marginal  and  bivariate  pfds  for  each  hypoth¬ 
esis  of  Case  1  and  Case  2  were  formed.  These  were  computed  over  the  Interval  (0.02. 12)  at 
33  grtdpoints  for  each  case.  The  interval  was  chosen  In  the  same  manner  as  described  in  the 
preceding  paragraph.  Figures  3.3  and  3.4  depict  the  nominal  and  estimated  marginal  pdfs 
for  each  hypothesis  for  Case  1  and  Case  2,  respectively.  For  both  the  marginal  and  bivariate 
estimators,  the  constant  c  in  (3.14)  was  set  to  0.1.  The  bivariate  pdfs  in  equation  (3.18) 
were  computed  for  j  =  1,2,...,  30.  for  H0 .  and  j  =  1,2,...,  150.  for  Hx .  The  choices  of 
the  maximum  j  were  due  to  computation  restrictions.  A  better  method  of  choosing  j  would 
have  been  to  estimate  the  decorrelation  time  under  each  hypothesis  and  use  those  values 
for  the  maximum  choice  of  j. 

After  the  marginal  pdf  estimates  were  obtained,  cdfs  were  computed  via  Simpson’s 
Integration.  These  bivariate  pdfs  were  summed  over  j  for  each  hypothesis  and  then  inte¬ 
grated  In  two  dimensions  (also  using  a  Simpson's  Integration)  to  yield  the  necessary  sums 
of  Joint  cdfs  required  for  the  optimum  quantization  function. 

Figures  3.5  and  3.6  show  the  quantization  functions  computed  for  Case  1  and  Case  2. 
respectively  using  the  expressions  given  in  Chapter  2.  Comparing  these  to  the  128-level 
uniform  quantizers  from  Chapter  2  (see  Figures  2.3  and  2.7),  some  similarities  can  be  noted. 
For  the  quantizer  of  Case  1  derived  from  estimated  pdfs,  note  that  the  drop  for  small  values 
of  x  Is  still  present  The  sharp  incline  for  large  values  of  x  is  still  present  for  values  of  x 
between  10.5  and  11.5.  The  function  is  relatively  flat  for  values  of  x  between  1  and  10.5. 
However,  note  the  drop  for  values  of  x  for  the  last  two  quantization  levels.  This  drop  may  be 
attributed  to  the  inaccuracies  of  the  estimates  of  the  pdfs  In  the  tails  of  the  densities.  The 
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Probability  Density  Function  f{z] 


Figure  3.3:  Nominal  and  Estimated  Marginal  Probability  Density  Functions  for  Case  1 


Case  2  quantizer  also  has  Inaccuracies  out  at  its  tails.  Table  3.1  lists  the  performance  of 
the  memory  less  quantizer  discriminators  using  the  functions  of  Figures  3.5  and  3.6.  The 
thresholds  a  and  b  were  set  for  desired  probabilities  of  error  of  10-3.  Despite  having  low 
probabilities  of  error,  these  discriminators  performed  poorly  when  the  average  sample  size 
was  considered.  The  Case  1  discriminator  required  an  average  of  3400  samples  to  make 
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z 

Figure  3.4:  Nominal  and  Estimated  Marginal  Probability  Density  Functions  for  Case  2 


a  decision,  while  the  discriminator  for  Case  2  required  an  average  of  4270  samples  for  a 
decision.  The  quantizers  from  Chapter  2.  which  were  derived  from  nominal  pdfs,  required 
an  average  number  of  samples  of  782  and  2660  for  Case  1  and  Case  2.  respectively,  for 
comparable  probabilities  of  error. 
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Chapter  4 


Neural  Network 
Discriminators 


The  memoryless  discriminators  derived  In  the  preceding  sections  can  be  easily  Implemented 
In  practice  because  they  only  require  estimating  first  and  second  order  probability  density 
functions  of  the  observed  process  under  H\  and  Hq.  These  memoryless  discriminators  use 
nonlinear  functions  of  one  variable,  with  the  form  Q(x).  which  are  chosen  to  maximize 
a  performance  measure  and  are  derived  from  first  and  second  order  probability  density 
functions.  The  nonlin  earl  ties  are  used  In  the  test  statistic  of  the  discriminators  as  follows: 

n 

r„  =  £o(Zy).  (4.1) 

3- 1 

Similar  nonlinear  functions,  which  have  memory  and  have  the  form 

7(xi,Z2,.  ..,xk), 

could  be  derived  to  optimize  the  same  performance  measures.  Test  statistics  of  the  form 

n 

Tn  =  ]>^7(Zk--i+;,Zk-2+;, - Zj)  (4.2) 

3  =  1 
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could  be  computed  using  nonlinear  functions  of  K  variables.  These  functions,  however, 
would  require  higher -order  probability  density  functions  to  be  estlmated(see  (101).  In  prac¬ 
tice,  only  first  and  second  order  probability  density  functions  can  be  easily  obtained  with 
reasonable  accuracy  for  a  small  amount  of  training  data. 

In  this  section,  we  restrict  the  class  of  nonlinearities.  7(11,  X2, .  ..,ik).  to  have  a 
maximum  absolute  value  -  not  an  unreal  limitation  in  a  real  system.  Then  we  use  a  per- 
ceptron  neural  network  to  form  our  nonlinearity  and  the  back-propagation  to  minimize  our 
performance  measure. 

4.1  Perceptron  Neural  Networks 


Perceptron  neural  networks  are  Interconnected  layers  of  simple  processing  units  called  per¬ 
ceptions.  A  perceptron  Is  Illustrated  In  Figure  4.1.  The  perception  takes  an  Input  vector 
x  =  (zo,arii  •  •  •  ,*/C-x)T  and  a  weighting  vector  w  =  (wo,wi,. .  .,wk-i)T  and  forms  a  dot 
product 

K-\ 

XiWi  =  xwT.  (4.3) 

i=0 

From  the  dot  product,  an  offset  value  6  is  subtracted  to  get  the  result  y  —  xwT  -  0:  y  Is 
then  passed  through  a  sigmoidal  nonlinearity  of  the  form 

'<»>  =  T?W  (4'4) 

The  sigmoidal  function  is  shown  in  Figure  4.2.  Note  that,  throughout  this  thesis,  we  use  the 
term  perceptron  and  node  Interchangeably.  We  also  refer  to  the  offset  value  6  as  the  node 
offset  value. 


61 


£  ) -  nonlinearity 

ZJ  jw 


■ 


Figure  4.1:  APerceptron 


To  gain  an  understanding  of  what  a  perception  does,  consider  a  perception  with  two 
inputs,  xo  and  xi .  This  Implies  that  the  perception  has  two  weights,  wq  and  W\ .  To  simplify 
the  analysis,  replace  the  sigmoidal  curve  with  a  hard  quantizer 


=  { 

lo, 


1,  If  x  >  0; 


otherwise. 


So  the  output  of  the  perception  is  either  a  0  or  1.  Figure  4.3  shows  the  Xo,  Xi  plane.  The 
perception  with  a  hard  quantizer  actually  forms  two  decision  regions  separated  by  the  line: 

-wn  8 

Xi  = - -xo  H - •  (4.6) 

W\  W\ 

(xo,Xi)  pairs  on  one  side  of  the  line  result  in  a  perception  output  of  1.  while  pairs  on 
the  other  side  of  the  line  result  In  an  output  of  0.  If  the  perception  had  I\  Inputs,  the 
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Figure  4.2:  A  Sigmoid  Nonlinearity 

decision  region  would  become  a  hyperplane  in  RK .  Note  that  the  location  of  the  hyperplane 
separating  the  decision  regions  is  determined  only  by  the  weights  w  and  the  offset  value  d. 

If  the  hard  limiter  above  is  replaced  by  the  sigmoidal  nonlinearity,  then  the  decision 
regions  become  soft.  That  is,  input  vectors  near  the  hyperplane  have  outputs  that  are  near 
Input  vectors  taken  farther  away  from  the  hyperplane  have  outputs  that  approach  0  or 
1.  depending  on  which  side  of  the  hyperplane  they  lie. 

More  complex  decision  regions  can  be  formed  by  utilizing  multiple  hyperplanes.  De¬ 
cision  regions  can  be  formed  by  using  a  perception  to  form  each  hyperplane  of  a  complex 
region.  The  output  of  each  perception  can  then  be  fed  into  an  AND  gate  —  or.  better  yet.  an¬ 
other  perceptron  with  weights  and  an  ofTset  appropriately  set  to  simulate  an  AND  function. 
This  leads  to  the  concept  of  multi-layer  perceptron  neural  networks. 
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X 

0 

Figure  4.3:  Decision  Space  for  a  Perception 

Multiple-layer  perception  neural  networks  take  the  outputs  of  the  perceptrons  on  a  layer 
and  use  them  as  Inputs  to  the  next  higher  level  of  perceptrons  (see  Figure  4.4.)  Networks 
of  this  type  are  usually  called  feed-forward  neural  networks.  As  demonstrated  In  the  above 
discussion,  a  single  perception  can  only  divide  the  decision  space  with  a  hyperplane.  But  it 
has  been  shown  that  a  two-layer  perception  neural  nelwuik  can  form  any  convex  decision 
region  111],  A  convex  region  Is  a  region  from  which  any  two  points  can  be  connected  by  a 
line  which  lies  entirety  within  the  region.  A  third  layer  of  nodes  can  allow  the  network  to 
form  any  arbitrary  decision  region  1 1 1)  (assuming  enough  nodes  are  allocated  to  the  correct 
layers.) 

To  form  a  desired  decision  region,  the  weights  and  node  offset  values  for  each  node  in 
each  layer  of  a  neural  network  must  be  specified.  This  would  be  a  difficult  task  even  if  the  de¬ 
cision  region  were  known.  But,  for  many  problems,  the  decision  region  is  not  known  because 
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outputs 


inputs 

Figure  4.4:  A  Multiple-Layer  Perception 


the  statistical  models  of  the  data  are  not  known.  Training  algorithms  to  form  appropriate 
decision  regions  exist  for  perception  neural  networks.  These  algorithms  typically  present  the 
training  data  to  the  network  along  with  a  desired  response  and  the  network  weight  values 
and  node  offset  values  are  adjusted  to  force  the  actual  network  response  towards  the  desired 
response.  One  such  algorithm  Is  the  back-propagation  algorithm.  The  back-propagation  al¬ 
gorithm  Is  a  gradient  search  method  (searching  over  w  and  0),  which  minimizes  the  square 
error  of  the  neural  network  outputs  (121.  Note  that  the  back-propagation  algorithm  requires 
the  nodes  to  have  sigmoidal  nonllnearltles.  (See  Appendix  B  for  a  description  of  the  back- 
propagation  algorithm.) 
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4.2  The  Neural  Network  Sequential  Discriminator 


As  mentioned  in  the  introduction  to  this  chapter,  optimal  nonlinear!  ties 
7(ziiZ2,- ••>£*)  could  be  derived  for  use  In  a  discriminator  using  the  test  statistic 
Tn  =  Yl’jsK  Zx-2+j,  •  •  • » Zj).  (10)  derived  one  step  memory  nonlinearities  for 

use  in  a  block  discrimination  scheme  to  form  the  test  statistic  Tn  =  ^Z"_2  Zj). 

However,  in  general  these  nonlinearities  require  knowledge  (or  estimation)  of  the  pdfs  of  the 
data  of  degrees  higher  than  two.  Nonlinearities  in  (10J  require  pdfs  of  the  data  of  the  fourth 
degree  under  each  hypothesis. 

We  now  consider  a  suboptimal  approach  that  utilizes  perception  neural  networks  and 
yields  excellent  performance.  We  start  by  defining  the  structure  of  our  sequential  discrimi¬ 
nator.  Our  discriminator  utilizes  a  test  statistic  of  the  form 

n 

Tn  =  ^2  7(Zk_i+j,Z/c_2+;,. 
j=K 

A  two  threshold  test  is  implemented,  using  the  constants  a  and  b.  So.  upon  obtaining  a  new 
data  sample,  Zn,  the  discriminator  computes  the  test  statistic  Tn.  If  Tn  reaches  6,  then  the 
discriminator  chooses  H\  and  terminates  the  test.  If  Tn  drops  to  o.  then  the  test  terminates 
and  the  discriminator  chooses  Hq.  If  Tn  lies  between  a  and  6.  then  another  sample  Zn+\ 
is  obtained.  Tn+\  is  computed,  and  the  entire  test  is  repeated.  This  process  continues  until 
either  a  decision  is  made,  or  the  JV-th  sample  is  reached.  Upon  obtaining  the  A'-th  sample. 
T/v  is  computed  and  a  one-threshold  test  is  performed.  Obviously  7k_i  is  Initialized  to  a 
value  in  the  interval  (a,  6). 

We  now  restrict  the  class  of  nonlinear! ties  of  the  form  7(zj ,  x2 .....  x k  )  to  have  a 
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range  with  maximum  absolute  value  of  r.  That  is.  we  require 

|7(xi,i2,  •  •  •  ,xk)I  <r  for  all  possible  values  of 

the  K  -  tuple  (xi,X2, ...  (4.7) 

This  restriction  leads  to  a  suboptimal  discriminator,  but  allows  us  to  obtain  a  solution. 

Now  assuming  that  r.  a,  and  b  are  all  specified  constants,  the  structure  of  our  test 
allows  us  to  scale  r.  a.  and  b  to  get  a  test  with  a  nonlinearity  with  a  maximum  absolute 
value  of  1.  The  newly  scaled  thresholds  shall  be  denoted  as  a  and  b.  This  rescaling  of  r  to  1 
allows  us  to  utilize  a  perception  neural  network  with  a  sigmoid  nonlinearity  on  its  nodes  in 
the  following  paragraphs. 

To  find  the  optimal  nonlinearity  within  our  class,  we  first  consider  the  optimal  paths 
that  the  test  statistic  Tn  can  take  under  each  hypothesis.  By  optimal  path  we  mean  the  path 
that  Tn  should  take  to  minimize  the  number  of  samples  needed  to  cross  the  correct  threshold 
under  the  appropriate  hypothesis.  Obviously  the  quickest  path  to  reach  a  threshold  is  when 
the  discriminator  takes  a  step  of  magnitude  1  in  the  appropriate  direcUon  upon  obtaining 
each  new  data  sample.  That  is.  for  each  new  data  sample,  the  test  staUsUc  under  H\ 
is  incremented  by  +1.  while  the  test  statistic  under  Hq  Is  incremented  by  —1.  If  the  data 
sequence  {Z}^,  is  obtained  by  sampling  some  continuous  process  with  a  uniform  sampling 
period  T,  then  the  optimal  path  for  Tn  would  lie  on  a  straight  line  with  slope  -f  ^  for  H  j 
and  slope  — ^  for  Ho.  Figure  4.5  depicts  these  paths.  Thus,  for  an  ideal  discriminator,  (that 
is  a  discriminator  which  never  makes  mistakes  and  always  uses  the  minimum  number  of 
samples  possible),  the  statistics  of  the  nonlinearity  should  be 

£,(7(Zi,Z2 . Zk)]  =  1 
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Eob(ZuZ2,...,ZK)\  =  -1 


VanMZuZt . ZK))  =  0 


Varo[-i{Zi,Zi,.,.,ZK)\  =  0 


(4.8) 


We  cannot  expect  a  real  discriminator  to  achieve  the  statistics  of  the  above  equations. 
However,  we  can  choose  the  nonlinearity  to  minimize  some  performance  measure,  such 
as  a  mean  squared  error  criterion  of  7  about  Its  desired  values.  We  show  that  the  back- 
propagation  algorithm  can  be  used  to  minimize  a  related  mean  squared  error  criterion. 

We  form  a  nonlinearity  by  constructing  a  perception  neural  network  with  K  inputs 
-  , XK  and  two  outputs  which  are  functions  of  the  inputs  (and  the  weights/offsets 

for  each  perceptron  in  the  network).  ol(xi,x2, . . .  ,xk  )  and  o°(ii ,x2, . . .  ,ik)-  To  simplify 
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the  notation  we  denote  the  output  nodes  as  o1  and  o°.  During  training  (see  Section  4.3).  the 
desired  values  of  the  output  nodes  are  (10)  for  inputs  from  Ho  and  (0 1 )  for  inputs  from  H\ . 
Our  notation  ( x  y )  Implies  that  o°  =  x  and  o1  =  y.  The  nonlinearity.  7(11 ,  xi , . . . ,  i/<- ).  is 
formed  by 

7(*1,  *2,  •••,**)  =  o1(x  1,X2,...,XK)  -  o0(xux2,...,Xf<), 

or  with  simplified  notation, 

7  =  o1  -  o°.  (4.9) 

We  wish  the  nonlinearity  to  be  such  that  7  Is  close  to  values  of  1  for  inputs  from 
H 1  and  -1  for  inputs  from  Hq.  We  choose  a  performance  measure  which  involves  the  mean 
squared  error  of  o1  and  o°  about  their  desired  values  for  each  hypothesis: 

5S  =  E0  [(1  -  o0)2  +  (0  -  a1)2]  +  £1  [(0  -  o0)2  +  (1  -  o1)2]  .  (4.10) 


We  would  like  the  weight  and  node  offset  values  of  each  perceptron  in  our  neural  network 
to  have  values  which  minimize  equation  (4.10). 

We  now  try  to  relate  this  performance  measure  via  an  Intuitive  argument  to  perfor¬ 
mance  measures  from  the  previous  chapters .  Comparing  (4.10)  to  our  performance  measure 
S3  from  Chapter  2.  we  notice  that  they  are  similar.  Recall  that 


n  _  [Pi  ~  Ho? 

3  ~  W  +  *0)  ' 


(4.11) 


Effectively,  by  maximizing  equation  (4.11).  the  expected  values  of  the  nonlinearity  are  sep¬ 
arated.  while  the  variances  about  the  expected  values  are  minimized.  Minimizing  our  per¬ 
formance  measure  S$  fixes  the  difference  of  the  desired  values,  and  minimizes  a  second 
order  moment  Both  performance  measures  try  to  keep  the  expected  values  separated  while 
minimizing  a  second  order  moment  about  (or  near)  the  expected  value. 


69 


Recall  that  the  back-propagation  algorithm  {12]  la  a  gradient  descent  algorithm  which 
minimizes  the  performance  measure 

p  j 

where  Vp  is  the  desired  output  for  node  j  associated  with  input  pattern  p,  and  o£  is  the  actual 
value  of  the  output  node  j  associated  with  input  pattern  p.  Suppose  we  have  P  K -tuples 
from  each  hypothesis  available  for  training  the  neural  network.  We  also  have  j  =  0,1  for 
the  two  output  nodes  o1  and  o°.  respectively.  We  can  rewrite  (4.12)  as 

£  =  sE{(1"°O)2+(0~<,1)2}  +  i  ^{(o-ov  +  u-o1)2}  (4-13) 

P-0  P=P 

where  the  first  sum  is  over  the  Ho  training  patterns  and  the  second  sum  is  over  the  Hi 
training  patterns.  The  problem  of  minimizing  E  is  equivalent  to  minimizing  E  scaled  by  a 
constant.  Thus  minimizing  (4.13)  is  equivalent  to  minimizing 

P—1  2P  — l 

=  j  E K1  -  °0)2 + (°  - o1)2}  +  i  E  -  °0)2 + c1  - ol )2)-  (4-14) 

p~  o  P=p 

Now  as  P  — *  oo  we  have 

1e  —  E0  [(1  -  o0)2  +  (0  -  o1)2]  +  Ex  [(0  -  o0)2  +  (1  -  o1)2]  =  5S,  (4.15) 

which  is  our  desired  performance  measure.  Consequently,  the  back-propagation  algorithm 
is  a  reasonable  algorithm  to  be  utilized  for  our  perceptron  neural  network  nonlinearity. 
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incoming 

data 

sequence 


Figure  4.6:  Sequential  Neural  Network  Discriminator 
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Using  this  nonlinearity  we  can  form  the  test  statistic 

n 

Tn  -  Zk-2+j,  •  •  •  i  Zj).  (4-16) 

i- 1 

Figure  4.6  shows  the  implementation  of  our  test.  The  Incoming  data  samples  are  passed 
through  a  tapped  delay  line.  The  K  taps  are  the  inputs  to  the  perceptron  neural  network. 
The  difference  of  the  outputs  of  the  neural  network  is  formed  and  added  to  the  test  statistic 
Tj.  The  notation  subscripts  j  correspond  to  the  values  associated  with  the  jth  data  sample. 
The  sample  number  j  is  compared  to  N.  If  j  reaches  N.  then  a  one  threshold  test  is 
performed  (in  this  figure  the  threshold  is  0.)  If  j  is  less  than  N .  then  a  two  threshold  test  is 
performed. 


4.3  Neural  Network  Training  Phase 


The  neural  network  used  in  our  sequential  discrimination  scheme  operates  on  K -tuples 
(Z/C_i+j,  Z/c_2+j, . . . ,  Zj).  which  are  formed  from  the  incoming  data  sequence  {Z)}“L  j  on 
which  the  discriminator  must  make  a  decision  of  Hi  or  Hq.  The  neural  network  may  have 
two  or  three  layers  of  nodes,  but  it  will  always  have  two  output  nodes  on  the  output  layer. 
Figures  4.7  and  4.8  depict  the  two  possible  forms  of  the  neural  network  considered  in  this 
thesis. 


The  neural  network  is  trained  using  the  back-propagation  algorithm  and  the  training 
data  set.  The  training  data  set  consists  of  M  sample  paths  of  length  N  from  each  hypothesis. 
These  training  data  are  defined  as  Cm,)'  where  t  =  0, 1  denotes  the  hypothesis  (Hi  or  H o). 
m  =  0, 1, ....  M  -  1  denotes  the  sample  path  number,  and  j  =  0, 1, . . . ,  N  —  1  denotes  the 
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outputs 


Figure  4.7:  A  Two  Layer  Perception  Neural  Network 

outputs 


K  inputs 


Figure  4.8:  A  Three  Layer  Perception  Neural  Network 


sample  number.  The  desired  responses  for  the  neural  network  are  (10)  for  Ho  and  (0  1) 
for  Hi.  Our  notation  (a  b)  Implies  that  the  output  node  0  outputs  a  and  the  output  node  1 
outputs  b. 

The  training  process  proceeds  as  follows:  The  first  K -tuple  from  the  first  sample 
path  from  Ho.  (Co,oi  Co,i » •  •  •  >  (o,k-i  )• ls  presented  to  the  neural  network  Inputs.  The  back- 
propagation  algorithm  ls  performed  using  (1  0)  as  the  desired  output.  Then  the  first  K- 
tuple  from  the  first  sample  path  from  H\ ,  (Cq0,  Co  i  i  -  •  -  >  Co  k-\ )• 43  presented  to  the  neural 
network  inputs.  Back-propagation  ls  performed  with  the  desired  response  of  (0  1).  Then 
the  second  /('-tuple  from  the  first  Ho  sample  path.  (Co  l  >  Co  2 »••••>  Co  k  )• 13  presented  to  the 
network  for  back-propagation.  Then  the  second  I\  -tuple  from  the  first  H\  sample  path. 
(Co.l  i  Co,2»  •  •  •  i  Co,K  )•  Is  presented  to  the  network  for  back-propagation.  When  all  the  K- 
tuples  (of  ordered  adjacent  samples)  from  the  first  sample  path  for  Ho  and  Hi  have  been 
exhausted,  the  process  Is  repeated  for  the  remaining  until  they  have  all  been  exhausted. 
Then  the  entire  process  is  repeated  until  all  sample  paths  have  been  presented  to  the  network 
L  times. 


4.4  Determination  of  Thresholds  a  and  b 


The  discriminators  in  Section  4.3  are  trained  to  minimize  the  squared  error  of  the  desired 
outputs,  o1  and  o°.  under  each  hypothesis.  In  effect,  the  average  slope  of  the  path  of  the  test 
statistic  is  forced  towards  +1  for  Hi  and  -1  for  Ho.  In  this  section  we  suggest  a  scheme 
for  determining  practical  values  of  the  thresholds  a  and  b.  Intuitively,  as  the  thresholds  are 
moved  farther  away  from  zero,  the  probabilities  of  error  decrease,  while  the  average  sample 
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size  Increases.  Therefore,  by  constraining 

b=-a>  0  (4.17) 

to  correspond  to  the  largest  of  the  desired  a  and  (3  probably  will  not  affect  the  performance 
of  the  discriminator  this  assumes  that  the  desired  values  of  a  and  (3  are  small.  We  also 
impose  the  following  constraint  on  the  maximum  value  of  b: 

b<  B.  (4.18) 

We  force  our  test  to  begin  with  Tk- i  =  0.  By  utilizing  the  training  data  Qxm  j  and  the  neural 
network  discriminator,  we  generate  the  output  data  sequences 

TLj  (4.19) 

where  t  =  0, 1  denotes  the  hypothesis.  m  =  0,l,...,M  —  1  denotes  the  sample  path  number, 
and  where  j  =  K,  K  +  1, . . . ,  N  —  1  denotes  the  test  statistic  number.  Define  a  new  set  of 
functions  e'm(b)  by 

,  (  0,  if  discriminator  with  thresholds  —6  and  b  chooses  H,  for  path  T' 

4(»)=  (4.20) 

(  1,  otherwise. 

Now  using  the  functions  eo(6),e}(6), . . . ,  tlM_x(b)  and  eo(6),  «?(*),  •  •  • ,  eM-i  W  define 

m=0 

(4.21) 

m=0 

Thus  e'(6)  is  the  average  number  of  errors  for  thresholds  -b  and  b  under  hypothesis  H,. 
So.  as  M  — .  oo  we  have 

e°(b)  —  o(b) 
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i\b)  — >  0(b)  (4.22) 

where  a(b)  and  0(b)  are  the  probabilities  of  false  alarm  and  miss  respectively.  Notice  that 
they  are  functions  of  the  threshold  b. 

Since  It  Is  not  possible  to  generate  a  continuous  function  on  the  computer,  we  can 
simulate  equations  (4.20)  and  (4.21)  with  discrete  bins  or  Intervals.  In  this  manner,  reason¬ 
able  values  of  b  (and  —6)  can  be  chosen  to  get  desirable  values  of  a  and  0.  Using  equation 
(4.21),  we  choose  the  value  of  b  that  satisfies  the  constraints 

0  <  el(x )  for  all  x  >  b 

a  <  e°(x)  for  all  x  >  b.  (4.23) 

4.5  A  Scheme  for  Multiple  Hypothesis  Discrimination 


Generalizing  the  binary  hypothesis  neural  network  discriminator  to  a  multiple  hypothesis 
discriminator  can  be  achieved  without  much  effort.  Instead  of  two  neural  network  out¬ 
puts  o°  and  o1.  the  neural  network  shall  have  R  outputs.  o° ,ol , . . .  1oFl~1 .  correspond¬ 
ing  to  the  R  hypotheses  The  test  statistic  is  now  the  vector  T„  = 

(TS,n,...,rn"-')I'.»here 

n 

Tn  =  £  r(zK  -l+>,Z/c_2+J,...,ZJ) 

J-K 

(4.24) 
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:=o 

The  nonlinearity  =  (7°,  7] , . . . ,  Is  formed  by  setting 

7]  =  o'j,  fori  =  0, 1.  (4.25) 
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Thus,  each  component  of  Tj  has  a  maximum  value  of  1.  Instead  of  a  two-threshold 
test,  the  multiple  hypothesis  sequential  test  utilizes  R  thresholds  a0 ,  a1 , . . . ,  aft_1 .  The  test 
proceeds  as  follows:  Obtain  a  new  data  sample  Zn.  Form  the  new  test  statistic  Tn.  If 
T. £  exceeds  the  other  components  of  Tn  by  a  margin  of  a*,  then  stop  the  test  and  declare 
Hj.  If  no  decisions  are  made  for  the  sample  Zn,  the  next  sample,  Zn+y  is  obtained.  Tn  is 
computed,  and  the  test  is  repeated.  Once  again,  after  the  maximum  number  of  samples.  N . 
has  been  reached,  a  block  test  is  performed.  The  block  test  is  performed  by  choosing  the 
hypothesis  which  satisfies 


arg  min  <  a'  — 
0<i<R~l 


n-  £  n 


OCXR-1 


Figure  4.9  depicts  the  structure  of  the  mulUple  hypothesis  sequential  test. 


(4.26) 
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Table  4.1:  Discrimination  Cases 


4.6  Numerical  Results 


In  this  section,  the  performance  characteristics  of  the  neural  network  discriminators  are 
evaluated.  As  in  Section  2.7.  the  data  used  for  evaluation  of  the  neural  network  discrimi¬ 
nators  Is  simulated  radar  data.  Table  4. 1  summarizes  the  three  data  cases  considered  in 
this  section.  Case  1  and  Case  2  are  identical  to  Case  1  and  Case  2  from  Section  2.7  A 
new  case.  Case  3.  Is  also  considered.  Case  3  has  Rayleigh  pdfs  under  both  H\  and  Ho  with 
matched  means  and  powers  of  the  marginals.  The  decorrelation  time  constants  are  identical 
to  those  of  Case  1  and  Case  2.  Radar  envelope  samples  {Z,  }°^0  are  generated  via  computer 
simulation  by  equations  (2.52)  through  (2.55).  Just  as  In  SecUon  2.7. 

Tables  4.2  through  4.4  summarize  the  neural  networks  simulated  and  trained  to 
operate  in  the  discriminator  structure  of  Figure  4.6.  The  first  column  contains  the  designated 
net  name.  The  second  column  lists  the  number  of  Inputs  (Le.  A  ),  while  columns  three  and 
four  list  N0  and  N i  respectively.  Recall  from  Figures  4.7  and  4.8  that  N0  and  A', .  are  the 
number  of  nodes  on  layers  0  and  1.  All  of  the  neural  networks  listed  In  Tables  4.2  through 
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4.4  have  two  outputs.  The  final  column  In  Tables  4.2  through  4.4  contains  the  performance 
measure.  5s.  which  was  estimated  using  the  training  data  after  completion  of  the  training 
phase. 

All  neural  networks  were  trained  using  the  back  propagation  algorithm  and  the  train¬ 
ing  data  with  the  method  detailed  in  Section  4.3.  The  training  data  were  also  generated  by 
simulation  using  equations  (2.52)  through  (2.55).  The  sample  paths  generated  so  that  the 
number  of  samples  in  each  path.  N,  was  1000.  The  number  of  sample  paths  from  each  hy¬ 
pothesis.  M.  was  set  to  50.  The  constants  for  the  back  propagation  algorithm  were  chosen 
by  experimentation  to  get  acceptable  convergence  rates.  The  gain  was  set  to  0.001.  while 
the  momentum  was  set  to  0.  Each  sample  path  of  the  training  data  was  presented  to  the 
network  100  times,  (that  is.  using  the  notation  of  Section  4.3.  X,=100.)  Since  nets  4.  8.  and 
12  have  three  layers  of  nodes,  we  set  L  =  200  to  allow  for  the  expected  slower  convergence 
rates  associated  with  the  additional  layer. 

Tables  4.5  through  4.7  summarize  the  results  of  the  neural  network  discriminators. 
The  first  column  of  each  table  contains  the  name  of  the  neural  network.  The  next  two 
columns  contain  the  probability  of  false  alarm  and  the  probability  of  detection,  respectively. 
The  next  column  contains  the  expected  number  of  samples  needed  to  make  a  decision.  Each 
discriminator  was  evaluated  by  simulating  10.000  sample  paths  from  each  hypothesis.  The 
probabilities  of  false  alarm  and  detection  were  computed  by  dividing  the  number  of  false 
alarms  and  correct  detections,  respectively,  by  10.000.  The  expected  number  of  samples,  or 
average  sample  number,  was  computed  by  averaging  the  number  of  samples  needed  to  make 
a  decision  for  our  test  sample  paths.  The  thresholds  a  and  6  were  chosen  by  experimentation, 
not  by  the  method  of  Section  4.4.  Our  second  choice  of  the  thresholds,  a  =  -  20  and  6  =  20. 
were  used  in  the  simulations  presented  in  this  secUon. 


80 


81 


Figure  4.10:  Sample  Paths  from  H\  and  Ho 

Examining  the  results  for  Case  1.  (see  Table  4.5),  we  see  that  the  discriminators 
performe  well.  All  discriminators  correctly  classify  all  20,000  sample  paths.  As  the  perfor¬ 
mance  measure  S5  decreases  from  1.014E-05  to  7.566E-08,  the  average  sample  number 
decreases  from  49  to  28.  Figures  4.10  depicts  a  typical  sample  path  under  hypotheses  H\ 
and  Ho.  respectively.  Figure  4.11  is  the  corresponding  test  statistic  for  the  neural  network 
discriminator.  Recall  from  Section  2.8  that  the  128  level  uniform  quantizer  designed  using 
the  nominal  ( Le .  known,  not  estimated)  cdis  had  a  measured  probability  of  false  alarm  of  0.  a 
probability  of  detection  of  0.9896.  and  an  average  sample  number  of  782.  Clearly,  the  neural 
network  scheme  works  significantly  better  than  the  memoryless  discriminator  schemes. 
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TMt  Statistic  Number 

Figure  4.11:  Test  Statistics  from  H\  and  H0 

Case  2  results  are  tabulated  In  Table  4.6.  Most  of  the  discriminators  for  this  case  also 
performed  well.  The  discriminator  using  net  5  as  its  nonlinearity,  however,  had  a  probability 
of  false  alarm  as  high  as  0.0997.  One  can  see  that  the  performance  measure  S$  for  net  5  was 
slightly  higher  than  the  performance  measure  for  the  other  case  2  nets.  The  Case  2  data  also 
implies  that  smaller  values  of  the  performance  measure  5j  results  in  better  performance 
(in  probabilities  of  error  and/or  average  sample  number.)  Recall  from  Table  2.7  that  the 
performance  for  the  optimal  128  level  uniform  quantizer  discriminator  had  a  probability  of 
false  alarm  of  0.  a  probability  of  detection  of  0.9919.  and  an  average  sample  number  of  2660. 
Comparing  this  with  the  average  sample  number  of  43.  and  the  perfect  classificaUon  of  the 
net  7  discriminator,  we  can  conclude  that  the  neural  network  discriminators  outperformed 
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net  1 

0 

1 

49 

1.014  E-06 

net  2 

0 

1 

32 

3.259  E-06 

net  3 

0 

1 

32 

2.721  E-07 

net  4 

0 

1 

28 

7.566  E-08 

Table  4.S:  Performance  of  Caae  1  Neural  Network  Diacrlminatora 
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net  5 

0.0997 

0.9999 

132 

8.652  E-06 

net  6 

0 

1 

55 

4.075  E-06 

net  7 

0 

1 

43 

3.982  E-07 

net  8 

0.0001 

1 

59 

3.020  E-06 

Table  4.6:  Performance  of  Caae  2  Neural  Network  Diacrlminatora 
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net  9 

0.9904 

0.9999 

1000 

1.993  E-05 

net  10 

0 

1 

41 

4.012  E-06 

net  1 1 

0 

1 

36 

3.752  E-07 

net  12 

0.4619 

0.9986 

59 

1.893  E-05 

Table  4.7:  Performance  of  Caae  3  Neural  Network  Diacrlminatora 
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the  quantizer  discriminators  for  Case  2. 

Case  3  results  are  given  in  Table  4.7.  Recall  that  Case  3  was  Rayleigh  vs  Rayleigh  case 
with  matched  means  and  powers.  Note  that  only  the  decorrelation  times  differed.  Also  note 
that  the  quantizer  discriminators  from  Chapter  2  could  not  be  generated  for  this  case  because 
the  performance  measure  S3  is  always  zero.  The  discriminator  with  two  inputs,  net  9. 
performed  very  poorly  with  its  large  probability  of  false  alarm  of 0.9904  and  its  large  average 
sample  number  of  1,000.  However,  we  observe  that  this  network  had  a  large  performance 
measure.  =1.993  E-05.  On  the  other  hand,  the  discriminators  using  nets  10  and  11. 
with  their  perfect  classifications  and  relatively  small  average  sample  numbers  of  4 1  and  36. 
respectively,  performed  very  well.  The  discriminator  using  net  12  did  not  perform  well,  since 
its  performance  measure  of  1.893  E-05  was  too  large.  Net  12  probably  required  more  time 
to  converge  during  its  training  phase. 

A  neural  network  was  also  trained  for  the  multi-hypothesis  discrimination  scheme. 
The  number  of  hypotheses.  R ,  for  this  experiment  was  four.  Table  4.8  summarizes  the 
four  hypotheses.  Hypothesis  Ho  had  a  Rayleigh  pdf  with  decorrelation  time  of  0.013029 
seconds.  H j  was  lognormal  with  decorrelation  time  0.13029  seconds.  H\  had  a  OdB  mean 
ratio  and  a  OdB  power  ratio  (H\  vs  Ho)-  Hi  was  Rayleigh  with  the  same  decorrelation  time 
as  H 1 ,  and  had  a  OdB  power  ratio  (Hi  vs  Ho)-  H3  is  Rician  with  the  same  decorreatlon  time 
as  Hi.  a.  mean  ratio  of  OdB  and  a  6dB  power  ratio  (H3  vs  Ho)-  Training  data  was  again 
generated  using  equations  (2.52)  through  (2.55)  in  a  computer  simulation.  The  Rician  data 
was  created  in  a  manner  identical  to  Rayleigh  data,  except  that  the  underlying  Gaussian 
processes  had  a  nonzero  mean.  The  number  of  sample  paths.  M.  was  set  to  50  for  this 
experiment,  while  the  maximum  number  of  samples.  N .  was  2000.  Training  was  performed 
with  the  gain  constant  set  to  0.001  and  the  momentum  constant  set  to  0.  The  number  of 
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presentations,  L,  was  300. 


The  multiple  hypothesis  discriminator  was  also  evaluated  via  computer  simulation. 
Table  4.9  summarizes  the  performance  of  the  multi-hypothesis  experiment.  Each  row  lists 
the  results  for  the  10,000  simulated  sample  paths  from  each  hypothesis.  The  first  column 
lists  the  hypothesis  number,  the  second  column  the  number  of  decisions  in  favor  of  //0,  the 
third  the  number  of  choices  for  Hi.  the  fourth  the  number  of  choices  for  Hi,  the  fifth  the 
number  of  choices  for  H3,  and  the  sixth  column  gives  the  average  sample  number  for  the 
hypothesis  listed  in  column  one.  The  discriminator  acheived  ever  94  percent  correct  deci¬ 
sions  under  each  hypothesis  and  an  average  sample  number  (averaged  over  all  hypotheses) 
of  266. 

We  have  seen  networks  with  various  numbers  of  Inputs,  layers,  and  nodes  perform 
very  well  for  our  discrimination  cases.  We  now  consider  the  performance  of  a  network  with 
fixed  number  of  inputs  and  varying  nodes.  This  will  help  to  quantify  our  intuitive  belief  that 
more  nodes  in  a  neural  network  will  allow  a  finer  tuning  of  its  decision  regions.  Table  4.10 
lists  each  neural  network  and  the  associated  number  of  nodes  on  each  level.  Figure  4. 12  is  a 
graph  of  the  performance  measure  65  for  each  neural  network  in  Table  4.10.  Each  network 
was  trained  with  the  Case  1  training  data.  For  the  back  propagaticr.  algorithm,  the  gain 
was  0.001  while  the  momentum  was  0.  Each  sample  path  was  presented  during  training 
100  times  (Le„  L  =100.)  The  results  shown  in  Figure  4.12  are  intuitively  pleasing  since,  as 
the  number  of  nodes  increases  (or  number  of  levels  for  net  f),  we  see  that  the  performance 
measure  S5  decreases.  This  result  is  expected,  as  more  nodes  and  levels  will  allow  more 
hyperplanes  to  be  constructed  in  the  decision  space. 

To  see  how  the  number  of  presentations,  L,  affects  the  performance  measure  .95,  an 
experiment  was  performed  with  a  two  layer  network  with  A'o  set  to  16.  The  network  had  two 
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inputs,  and  was  trained  to  operate  on  discrimination  Case  1.  The  gain  term  in  the  back- 
propagation  algorithm  was  set  to  0.001 .  while  the  momentum  term  was  0.  Figure  4.13  shows 
the  performance  measure  5$  as  a  function  of  L.  the  training  cycle  number.  One  can  see  that 
the  curve  is  approximately  a  decaying  exponential.  At  first  there  is  pocr  performance  (large 
values  of  S5.)  As  L  increases,  the  performance  improves  until  it  reaches  a  steady-state 
minimum.  This  is  expected  since,  as  the  number  training  cycles  Increases,  the  decision 
region  should  converge  to  the  optimal  decision  region  (optimal  In  the  mean  squared  error 
sense.) 
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Table  4.8:  Hypotheses  for  a  Multiple  Hypothesis  Discrimination  Problem 
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Table  4.9  Results  for  Multiple  Hypothesis  Discrimination  Problem 
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Chapter  5 


Mismatch  Performance 
Results 


The  discriminators  in  the  previous  chapter  were  constructed  with  a  priori  information  in¬ 
volving  either  known  (or  assumed)  pdfs  or  training  data.  If  training  data  were  available,  the 
pdfs  were  either  estimated  to  construct  memoryless  quantizer  discriminators,  or  the  training 
data  was  used  by  a  neural  network  and  the  back-propagation  training  algorithm.  In  many 
real  situations  the  discriminators  are  presented  with  data  whose  statistics  are  different  from 
those  on  which  the  discriminator  was  designed  to  operate.  This  could  be  the  result  of  making 
invalid  assumptions  about  the  statistics  or  of  obtaining  a  non-representative  set  of  training 
data  which  results  in  less  accurate  estimates  of  the  pdfs.  Therefore,  the  discriminator  which 
is  chosen  by  the  designer  should  be  robust,  that  is,  the  discriminator  should  not  be  overly 
sensitive  to  changes  of  the  statistics  of  the  data. 

In  this  chapter  the  performance  of  our  discriminators  are  evaluated  under  mismatch 
conditions  (mismatch  meaning  that  the  data  have  different  statistics  from  the  data  for  which 
the  discriminators  were  originally  designed).  Since  there  is  an  infinite  number  of  possibili¬ 
ties  for  the  statistics  of  the  testing  data,  we  can  only  present  some  representative  mismatch 
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Figure  5.1:  Test*  for  Mismatch  of  Decorrelation  Times 


conditions.  This  mismatch  study  it  certainly  not  a  comprehensive  study;  the  cases  con¬ 
sidered.  however,  show  some  interesting  characteristics  of  the  performance  of  our  different 
discriminators  under  mismatch  conditions. 


5.1  Mismatch  of  Decorrelation  Times 


In  this  section  we  consider  the  performance  of  the  discriminators  under  mismatch  of  the 
decorrelation  times  To  and  T\.  Since  To  and  T\  correspond  to  the  correlation  coefficients 
po  and  pi,  this  is  effectively  a  mismatch  of  the  higher  order  pdfs.  The  marginal  pdfs  for 
these  tests  remain  unchanged.  Table  5.1  lists  the  discrimination  tests  for  mismatch  of 
the  decorrelation  times  tq  and  rt.  The  mismatch  data  for  all  four  tests.  (testl.a,  testl.b. 
testl.c  and  testl.d).  a^e  lognormal  versus  Rayleigh  with  matched  means  and  powers.  The 
underlying  Gaussians  for  Ho  have  variance  equal  to  4.  The  fourth  column  of  Table  5.1  lists 
the  decorrelation  times. 
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The  discriminators  considered  for  these  tests  are  the  128  level  uniform  quantizer 
listed  In  Table  2.5,  the  32  level  uniform  quantizer  listed  in  Table  3.1,  and  the  neural  network 
discriminator  referred  to  as  net  2  in  Table  4.2.  All  of  these  discriminators  were  designed  for 
lognormal  versus  Rayleigh  with  match  means  and  powers  and  (ri ,  r0  )=  (0. 13029,0.013029). 

We  see  that  test  1. a  Just  reverses  the  decorrelation  times  for  which  the  discriminators 
were  designed.  In  testl.b,  both  decorrelation  times  are  set  to  0.13029.  Both  decorrelation 
times  for  testl.c  were  0.013029.  while  for  testl.d  both  decorrelation  times  were  0.06. 

100  sample  paths  from  each  hypothesis  were  generated  according  to  the  appropriate 
distributions  listed  in  Table  5.1  and  presented  to  the  discriminators.  The  sample  paths 
were  generated  in  the  same  fashion  as  in  previous  chapters.  Table  5.2  lists  the  computer 
simulation  results  for  testl.a.  testl.b.  testl.c  and  testl.d  for  the  various  discriminators. 
Columns  labeled  Pj  contain  the  measure  probability  of  false  alarm  and  columns  labelled 
Pd  the  probability  of  detection.  Columns  labelled  with  E[n]  contain  the  average  number  of 
samples  required  to  make  a  decision. 

The  results  for  the  quantizer  discriminator  from  Chapter  2  (the  memory  less  quantizer 
discriminator  from  known  pdfs),  performer  well  for  all  four  tests.  For  all  tests,  the  measured 
probability  of  false  alarm  was  less  than  2  percent.  while  the  probability  of  detection  was 
greater  than  99  percent.  The  average  sample  size  varied  between  802  and  953.  This  is  still 
reasonable  compared  to  the  results  from  Chapter  2,  namely  average  sample  sizes  of  about 
780  and  similar  probabilities  of  error. 

The  quantizer  discriminator  derived  from  the  estimated  pdfs  did  not  perform  well 
for  testl.a  and  testl.b;  the  probability  of  false  alarm  for  testl.a  and  testl.b  were  0.40.  The 
quantizer  discriminator  from  estimated  pdis  had  low  error  probabilities  for  testl.c.  brt  a  large 
average  sample  number  of  3437.  For  testl.d.  the  quantizer  discriminator  from  estimated 
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Tabic  5.2:  Results  for  Mismatch  of  Decorrelation  Times 


pdfs  had  a  probability  of  false  alarm  of  0.29.  a  probability  of  detection  of  1.  and  a  average 
sample  number  of  2347. 

The  neural  network  discriminator  worked  marginally  well  for  testl  .d.  but  it  performed 
poorly  for  the  other  tests.  For  testl. a.  its  probability  of  false  alarm  (Le.,  the  error  probability 
under  Hq)  was  0.94:  this  corresponded  with  tq  being  mismatched.  For  testl.b.  r0  was  also 
very  different  from  its  nominal  value,  and  the  discriminator  had  a  very  high  probability  of 
false  alarm.  For  testl.c.  r\  was  very  different  from  its  nominal  value,  and  for  this  test  the 
probability  of  detection  was  0  (Le..  the  probability  of  error  under  Hi  was  1.)  For  testl.d.  the 
decorrelation  times  To  and  T\  were  both  at  values  midway  between  their  nominal  values;  the 
discriminator  performed  only  marginally  well  with  a  low  error  probability  under  H\  and  a 
probability  of  false  alarm  of  0.29. 
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The  results  In  this  section  Imply  that  the  memory  less  quantizer  discriminators  derived 
from  known  pdfs  tend  to  discriminate  using  the  marginal  pdfs  more  heavily  than  the  bivariate 
pdfs.  This  discriminator  worked  well  for  all  four  tests.  The  lognormal  vs  Rayleigh  marginal 
pfds  of  Case  1  produce  the  sharp  Increase  In  the  quantizer  function  for  large  values  of  the 
observed  data  sample  (see  Figure  2.3);  this  corresponds  to  the  tails  of  the  lognormal  density 
being  larger  than  the  tails  of  the  Rayleigh  density.  For  small  values  of  the  observed  data 
samples,  the  Rayleigh  density  values  are  much  larger  than  the  lognormal  density  value;  this 
produces  a  sharp  drop  in  the  quantization  function  for  small  values  of  the  observed  data 
samples. 

Apparently  the  memoryless  quantizer  discriminator  derived  from  estimated  pfds  Is 
more  dependent  upon  the  bivariate  pdfs  than  the  memoryless  quantizer  discriminator  de¬ 
rived  from  known  pdfs.  This  could  be  attributed  to  poor  estimation  accuracy  of  the  marginal 
and  bivariates. 

The  neural  network  discriminator,  however,  performed  poor  for  most  of  the  tests. 
This  implies  that  the  neural  network  discriminator  places  more  emphasis  on  the  higher 
order  pdfs  than  the  memoryless  quantizer  discriminators.  Since  the  memoryless  quantizer 
discriminators  use  only  one  observed  data  sample  at  a  time  when  forming  their  test  statistic 
and  since  the  data  from  these  tests  are  correlated,  one  could  expect  the  neural  network 
scheme  with  memory  to  perform  better  than  the  memoryless  quantizer  discriminators. 
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Table  5.3:  Tests  for  Mismatch  of  Marginal  pdfs 


5.2  Mismatch  of  Marginal  pdfs 


In  this  section,  the  values  of  r0  and  T\  remain  unchanged,  but  the  marginal  pdfs  are  var¬ 
ied.  This  Implies  that  the  bivariate  pdfs  are  changed  In  shape,  but  the  correlation  between 
samples  Is  unchanged.  The  same  discriminators  used  in  Section  5.1  are  used  for  the  results 
presented  In  this  section. 

Table  5.3  lists  the  six  tests  used  in  this  section.  Figures  5.1  through  5.2  Illustrate  the 
nominal  and  mismatch  marginal  pdfs  used  for  each  test.  Table  5.4  contains  the  correspond¬ 
ing  discrimination  results,  which  were  obtained  by  simulating  100  sample  paths  under  each 
hypothesis  and  were  generated  in  the  same  manner  as  previously. 

The  experiment  test2.a  used  data  from  Oc-  2  to  evaluate  our  discriminators  (which 
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Table  5.4:  Results  for  Mia  match  of  Marginal  pdfs 


were  designed  for  Case  1).  Examining  Figure  5.1.  we  see  that  the  actual  pdf  for  Ho  was 
unchanged  from  the  nominal  one.  but  that  the  pdf  for  H\  had  larger  variance  and  a  peak 
moved  to  larger  values  of  x.  The  results  (see  Table  5.4)  show  that  the  memoryless  quantizer 
discriminator  derived  from  the  known  pdfs  performed  very  poorly.  The  memoryless  quantizer 
discriminator  designed  from  estimates  of  the  Case  1  pdfs  had  reasonable  error  probabilities 
(Pj  =  0.04  and  Pm  =  0.09)  but  a  very  large  average  sample  number  of  4263.  The  neural 
network  discriminator  had  a  probability  of  false  alarm  of  0.  a  probability  of  miss  of  0.34.  and 
an  average  sample  number  of  67. 
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Figure  5.1:  Probability  Density  Functions  for  test2.a 

Experiment  test2.b  used  data  from  Case  3  (see  Chapter  4).  Here  the  actual  data  from 
both  hypotheses  had  marginal  pdfs  matched  to  the  nominal  Hq  marginal  pdf.  Both  memory¬ 
less  quantizer  discriminators  performed  very  poorly,  while  the  neural  network  discriminator 
had  small  probabilities  of  error  and  a  small  average  sample  number. 

Experiment  test2.c  was  Rayleigh  vs  Rayleigh  with  a  power  ratio  of  9.0309dB  (H\ 
versus  Ho)  and  an  underlying  Gaussian  variance  for  Hq  of  1.  The  marginal  pdf  for  H\ 
effectively  was  changed  so  that  its  peak  occurred  at  a  larger  value  of  r  than  the  nominal  Hq 
marginal  pdf.  The  marginal  pdf  for  Ho  had  Its  peak  at  smaller  values  of  x  than  the  nominal 
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Figure  5.2:  Probability  Density  Functions  for  test2.b 

marginal  pdf  for  Ho-  For  this  experiment,  the  memoryless  quantizer  discriminator  derived 
from  known  pdfs  classified  all  sample  paths  correctly.  The  average  sample  number  was  484. 
We  attribute  this  performance  to  the  probability  under  each  hypothesis  being  shifted  towards 
the  large  jumps  in  the  quantization  function  (see  Figure  2.3).  where  discrimination  power 
exists.  For  this  experiment,  the  memoryless  quantizer  discriminator  derived  from  estimated 
pdfs  and  the  neural  network  discriminator  performed  marginally  well. 
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Figure  S.3:  Probability  Density  Functions  for  test2.c 

For  test2.d  the  Hq  marginal  pdf  was  the  same  as  test2.c.  However,  the  H\  marginal 
pdf  was  matched  to  the  Hq  marginal  pdf.  In  this  experiment,  the  neural  network  discrimi¬ 
nator  performed  very  well  with  perfect  classifications  and  an  average  sample  number  of  50. 
However,  the  quantizer  discriminators  performed  poorly,  note  that  their  error  probabilities 
under  H\  were  very  large.  The  large  H\  error  probabilities  can  be  attributed  to  the  shift  in 
probability  to  the  negative  jump  in  quantization  function  (see  Figure  2.3);  this  changes  the 
test  statistic  Tn  in  favor  of  Hq. 
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Figure  S.4:  Probability  Density  Functions  for  test2.d 

For  test2.e,  the  Ho  marginal  pdf  was  matched  to  the  nominal  Ho  marginal  pdf.  The 
Hi  marginal  pdf  was  lognormal  with  a  10.439  ldB  mean  and  power  ratio  over  Ho-  With  the 
shift  In  mass  to  much  greater  values  of  x,  the  memoryless  quantizer  discriminators  derived 
from  known  pdfs  performed  very  well.  However,  the  quantizer  discriminators  derived  from 
estimated  pdfs  performed  poorly.  Note  the  drop-off  in  the  quantizer  function  of  Figure  3.5. 
which  was  attributed  to  Inaccuracies  of  the  pdf  estimates  at  the  tails.  The  shift  of  the  H\  pdf 
in  this  mismatch  condition  probably  caused  many  of  the  Hi  data  samples  to  be  mapped  Into 
negative  values  and  to  appear  to  be  from  Ho  ■  The  neural  network  discriminator  classified 
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Figure  5.5:  Probability  Density  Functions  for  test2.e 

all  sample  paths  correcty  and  had  an  average  sample  number  of  38. 

The  final  experiment,  test2.f,  was  Rayleigh  vs  Rayleigh  with  a  power  difference  of  OdB 
and  a  variance  of  the  underlying  Gaussians  for  Ho  of  10.  Figure  5.6  shows  that  the  peaks 
occur  at  larger  values  of  x  than  both  nominal  H\  and  Ho  marginal  pdfs.  Here,  however,  the 
Rayleigh  process  tails  were  not  heavy  enough  to  cause  the  processes  to  be  classified  con¬ 
sistently  as  H\  (or  the  quantizer  discriminators.  The  quantizer  from  known  pdfs  performed 
poorly  with  large  probabilities  of  error.  The  quantizer  from  estimated  pdfs  classified  all  sam¬ 
ple  paths  from  Ho  correctly  and  93  percent  of  the  H\  sample  paths  correctly.  The  neural 
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Figure  S.6:  Probability  Density  Functions  for  test2.f 

network  discriminator  classified  all  Ho  sample  p  ^ths  correctly  and  classified  67  percent  of 
the  H\  sample  paths  correctly. 

Although  the  neural  network  discriminator  did  not  perform  better  than  the  quantizer 
discriminators  in  all  cases,  it  showed  itself  to  be  less  sensitive  to  changes  in  the  marginal 
pdfs.  The  quantizer  discriminators  worked  very  well  for  some  these  experiments  but  per¬ 
formed  very  poorly  for  others.  These  results  indicate  that  memoryless  quantizer  discrimi¬ 
nators  rely  more  heavily  on  marginal  pdfs,  while  the  neural  network  discriminators,  which 
have  memory,  rely  on  higher  order  pdfs  -  or  correlation.  These  results  are  to  be  expected. 
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Chapter  6 


Conclusion 


In  the  previous  chapters  various  schemes  for  discrimination  were  considered.  Quantization 
functions  were  derived  that  maximize  performance  measures  shown  to  be  useful  In  both 
block  and  sequential  discrimination  schemes.  In  Chapter  2.  by  assuming  the  probability 
densities  of  the  data  under  each  hypothesis,  quantization  functions  were  constructed  for 
use  In  discriminators:  we  consider  this  a  parametric  scheme,  since  pdfs  were  assumed. 
In  Chapter  3.  non-parametrlc  estimates  of  the  marginal  and  bivariate  pdfs  were  obtained 
from  the  training  data  by  use  of  kernel  density  estimators.  These  pdfs  were  Input  to  the 
expressions  for  the  optimal  quantization  functions  in  Chapter  2.  The  resulting  quantiza¬ 
tion  functions  were  implemented  in  discriminators:  we  refer  to  these  memoryless  quantizer 
discriminators  as  non-parametrlc.  In  Chapter  4.  another  non-parametrlc  scheme  was  con¬ 
sidered.  Multilayer  perceptron  neural  networks  were  utilized  to  form  the  nonlinearities  used 
In  the  test  statistic  of  discriminators.  This  scheme  allowed  for  the  design  of  discriminators 
with  memory  without  the  requirements  of  knowledge  or  estimation  of  high  order  pdfs.  The 
neural  network  scheme  utilized  training  data  and  the  back-propagation  training  algorithm 
to  form  a  mean  squared  optimal  non-parametric  nonlinearity. 
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Tie  memoryless  quantizer  discriminators  in  Chapter  2  performed  reasonably  well. 
Their  error  probabilities  were  small  (for  enough  quantization  levels.)  but  their  average  sam¬ 
ple  numbers  were  high.  These  results  indicate  that  more  quantization  levels  give  better 
performance.  Quantization  functions  with  optimal  breakpoints  produced  the  same  discrim¬ 
ination  performance  as  quantization  functions  with  more  quantization  levels  but  uniform 
breakpoints. 

The  kernel  density  estimators  in  Chapter  3  supported  the  consistency  theory:  results 
of  experiments  that  estimated  marginal  densities  showed  that  larger  sets  of  data  resulted  in 
more  accurate  estimates.  Since  no  theory  was  available  for  consistency  of  hight-  order  pdf 
estimates  for  correlated  observations,  the  consistency  of  bivariates  was  not  checked.  The 
bivariate  consistency  was  also  not  checked  due  to  processing  limitations.  Estimation  of  the 
sums  of  the  bivariates  described  in  Section  3.3  required  as  much  as  3  days  of  epu  time  on 
a  Convex-210  mini-super  computer.  Tf  the  grid  that  the  estimates  were  computed  over  were 
made  denser  to  result  in  more  accurate  quantization  functions,  much  more  processing  time 
would  be  required.  Memoryless  quantizer  discriminators  designed  using  the  estimated  pdfs 
had  reasonably  low  error  probabilities  but  extremely  high  average  sample  numbers. 

The  discriminators  with  memory  constructed  using  multi-layer  perceptron  neural 
networks  and  the  back-propagation  algorithm  performed  very  well.  With  training  times  on 
the  order  of  a  few  hours  these  neural  network  discriminators,  for  most  experiments,  had 
probabilities  of  error  which  could  not  be  measured  (with  10,000  simulated  sample  paths) 
and  average  sample  numbers  at  least  an  order  of  magnitude  smaller  than  the  memoryless 
quantizer  discriminators.  Experiments  with  the  number  of  training  cycles  using  the  back- 
propagation  algorithm  pleased  our  intuition;  more  training  decreased  the  mean  squared  error 
of  the  neural  network  outputs.  Experiments  with  the  number  of  nodes  and  layers  were  also 
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pleasing  Intuitively:  more  nodes  on  a  layer  decreased  the  mean  squared  error  of  the  neural 
network  outputs.  The  addition  of  a  third  layer  on  the  neural  network  further  allowed  the 
back-propagation  algorithm  tojine  tune  the  nonlinearity  -  thus  reducing  the  mean  squared 
error. 

The  nonlinearity  constructed  by  the  neural  network  was  generalized  to  operate  in  a 
multiple  hypothesis  classification  scheme.  Simulation  showed  that  the  scheme  could  classify 
four  hypotheses  with  error  probabilities  less  than  6  percent  and  an  average  sample  number 
of  266. 

The  use  of  neural  networks  as  nonlinearitles  used  in  forming  a  test  statistic  certainly 
merits  further  study.  Topics  that  were  not  addressed  in  this  thesis  but  might  be  explored 
are  how  to  set  the  training  constants  and  how  allocate  the  number  of  nodes  and  layers  of  the 
perception  network.  Comparisons  could  be  made  between  the  neural  network  nonlinearitles 
and  the  optimal  nonlinearities  formed  with  knowledge  of  the  high-order  pdfs. 

The  mismatch  results  indicate  that  the  memoryless  quantizer  discriminators  are  sen¬ 
sitive  to  changes  in  the  marginal  pdfs.  The  neural  network  schemes  were  less  sensitive  to 
changes  in  the  marginal  pdfs  but  more  sensitive  to  changes  in  the  higher  order  pdfs  and  cor¬ 
relation.  The  addition  of  memory  to  the  discriminator  apparently  explains  this  phenomena. 
The  robustness  of  neural  network  discriminators  clearly  deserves  further  study. 
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Appendix  A 


Gradient  Evaluation 


The  evaluation  of  the  matrix  Is  necessary  for  a  gradient  search  technique  to  maximize 
the  performance  measure  with  respect  to  the  breakpoints,  unless  a  finite  difference  gradient 
computation  is  used.  If  more  accuracy  than  that  of  a  finite  difference  gradient  method 
is  desired,  such  as  when  it  is  expected  that  the  P  does  not  change  slowly  with  varying 
breakpoints  t.  then  the  gradient  must  be  explicitly  calculated.  This  appendix  contains  the 
necessary  equations  for  computation  of  the  matrix. 

To  compute  we  employ  Leibnitz’s  rule.  For  a  joint  cumulative  distribution  func¬ 
tion  Fxy(a,  b)  we  have 


FxY(a,b) 


fxv(x,y)dxdy 


(A.  1) 


where  Jxy(x,y)  Is  the  probability  density  function  associated  with  Fxy(a,6).  For  our 
problem  of  finding  optimal  breakpoints  and  levels,  we  need  to  evaluate  derivatives  of  the 
form  j^FXy(a,b),  and  £Fxy(c,c). 
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Using  Leibnitz’s  rule  113)  on  -^Fxy((i,  b)  we  get 


d  d 

•^Fxy(a,6)  =  —  y  j  fxY(x,y)dxdy 

=  h  Ljih'I)dx 

=  /_„  rAb’ *v* + s(‘. »)  ^  -  s(‘. 

=  g(b,a)=  I  JxY{a,y)dy 
J — oo 


(A.2) 


where  <7(6,1)  Is  defined  as 


g(b,x)=  [  fxY{x,y)dy. 

J  —  OO 


In  a  similar  manner  It  can  be  shown  that 


■^FxY(a,b)  =  J  fxY(x,b)dx 


(A- 3) 


(A.4) 


and 


d  rc  rc 

fcFXY(c,c)  =  J  fxY(x,c)dx  +  J  fxY{c,y)dy.  (A.5) 


Now  consider  the  nth  column  and  ith  row  of  the  matrix  P. 


m, 

(Pi )  —  2  Pr,  {ATj  €  (  ^n-ii^n  ]  AND  Xj+i  6  (  ]} 

\  /  n,f  T"“* 

J=t 

-  (2m,  +  l)[F,(t„)  -  F,^.,)]^,)  -  Fi(t«-i)]. 


(A.6) 


We  know  that 


Fr<{  Xi  €  ( J  AND  Xj+i  €  ( */_!,*<  ]  } 


=  F*"X’  +  '  (««,*<)  +  F/X ‘’X'  +  1 
-FX"X’+'  (tn.ut(). 
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(A.7) 


This  yields 


(>)„  ^  =  2^|/;Xl,x’+1  (tn,tt)  +  FtXuXi"  {tn-xjt.  i) 

1  (A.8j 

-  FXl>X'  +  ‘  (*„-!,<<)  -  F?"Xi+l  (tn,U- 1)  ] 

-  (2m,  +  l)[Fi(<n)  -  Fi(l„-i)][Fi(*<)  -  fi(U-l)b 

So.  applying  equations  (A. 2)  through  ( A.5)  to  the  above  expression  for  (p)  ^  for  the  vari 
ous  values  of  n  and  /.  we  get  the  following  expressions: 

Case  1:  n,£  ^  k  and  n,£  jt  k  +  1 


Case  2:  n  =  k,l  ^  k,l  ^  k  +  1 


(A.9) 


_  (tk,U.x)  -  F*1’*’" 

-  (2m<  -  1  )[Fi{tk)  -  Fi(tk-x)][Fi(tt)  -  ] 

J  (A. 10) 

=  2I t{J*‘  /X‘,X’+I(^,y)dy  +  0  -  j  ‘  '  f*"X,+'{tk,y)dy  -  oj 


-  (2m<  +  l)lF<(t<)  -  Fj(t<_i))Fi(tfc) 

=  2E  r  //C,'X,+,(^y)dV-(2m,  +  l)lF,(l,)-F,(t/_i)l/fc(t*) 

j-i  1 
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Case  3:  n  =  k  +  1,£  ^  k,t  ^  k  +  1 


(**+!,*/) +  *f1,X'+1 


(**,</- 1) 


_  -  /f,,X’+1  («*,«/)  } 

-  (ami  +  l)[Fi(4+i)  -  fl(*fc)][fl(M  -  Wi- 1)]] 

= 2  E  {° +  jf  1  f?"***'^***  - 0  ~  jf 


-  (2m,  +  1)  (#(*<)  -  F<(ti_i  )][-/,(«*)] 

[F,(//)  -  Fi(i/_i )]  fk(tk) 

(A.ll) 


Case  4:  n^fc,n^fc+l,^  =  A: 


=  A|2g{^X,’X’  +  H<n,^)+^X‘’X'  +  ,(<n-I,^-l)  +  ^Xl’X,  +  l(^ 

+  F?"Xi*l(tn-i  ,**)} 

-  (2m,  +  l)[Fi(tn)  -  Fi(tn-i)] [Fi(tfc)  -  Fi(tfc-,)]j 

=  2E  {jf  f*"X,+'(x,tk)dx  +  0  -  0  -  jf /,X"X'+1(x,4)<ix| 

-  (2mj  +  l)[F,(in)  “  F(*n-1  )]/«(**) 

=  2Y  fX"X’+'(x,tk)dx  -  (2  mi  +  1)  [/<(<„)  -  F,(<n-i  ))/.(**) 

(4.12) 
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Case  5:  n  ^  k,n  ^  fc  +  1,£  =  k  +  1 


-  (2m,  +  1)  [Fdtr.)  -  «^-i)][f'.(!*+i)  -  Fi(tk)) 

=  2E  {0+  /*"*'*'(*.<*)*  -  jf"  -o} 


-  (2m<  +  l)[f<(tn)  -  Fj(ln-i  )][-/«(**)] 


TTl*  i*  t  n 

=  -2\]  fXuX,+l(x,tk)dx  +  (2mi  +  l)[Fi(tn)-Fi{tn-i)}Mik) 
j= i-'*— i 


(A.13) 


Case  6:  n  =  k,l  =  k 


Xi+l(tk,tk)  +  FX"Xi+'{tk-  1,4-,) 


_  FXiXi*\tk,tk. i)  -  F*‘>X’+,(4-i,4)} 

-  (2m<  +  l)[Fi(«*)  -  FKtfc-OllJWfc)  -  Fi(«fc-i)] 

=  2S{  /  ^  f*l’X,+'(x,tk)dx  +  fXlX’*1(tk,y)dy 

-  fX"X>+'(tk,y)dy-  £~\x,tk)dx | 

-  2(2trij  +  l)/»(tfc)  [Fj(tfc)  —  Fi(tk- 1 )] 

=  2E{/U  f?"X,+'{X,tk)d*  +  £  f,Xl'x,*l(t^y)dy} 

-  2(2m(  +  l)/i(tfc)[^(«fc)  " 


(A.14) 
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Oil 


Case  7:  n  =  k  +  l,£  =  k  +  1 


Pi 


*+i,fc+i 


-  FX"X’+'(tk+utk)  -  FX"X’"(tk,tk+ 1)  j 

-  (2m,  +  1)[ Fi(tk+i)  ~  Fi(tk)}[Fi(tk+ 1)  -  Fi(tk)\ 

=  2^|  J*"  fX"X’+'(x,tk)dx  +  jf“  fXl,X’*i{tk,y)dy 

-  I*'*'  fX"Xi+'(X,tk)dx  -  J*k~\x,tk)dx} 

+  2(2 rrn  +  1  )fi(tk)  [Fi{tk+ 1)  -  F(tk)) 

=  -2f^|^‘+1  fX"X,+'{X,tk)dx  +  J  ^  fX'X’+1(tk,y)dy j 

+  2(2m,  +  1  )fi(tk)  lFi(tk+i)  -  Fi{tk)} 


(A.  15) 
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Case  8:  n  =  k,l  =  fc  +  1 


*,*+i 


wk  [2E{^x,,x,+1(^4+i)  + 
_FXi'X,+l{tk  ,<fc+l)J 


-  (2m<  +  1  )[Fi(tk)  -  Fi(<fc_1)](Fi(«fc+,)  -  Fi(<*)]j 

=  2fi{j[U+1  f!C"X’+1(tk,y)dy+  fX"X’+'(x,tk)dx 

-  jT  fltk'v)dy  -  £  f*"X’+\X,tk)dx} 

-  (2 m<  +  l)|/<(tfc)[^(**+i)  -  fl(*fc)l  -  /<(**)  [#(**)  -  *W*-i)]} 

=  2E{/<t+^ 1  f!CuX,*'(tk>y)dy-  1*“  f?"Xi*'(X,tk)dx } 

-  (2m;  +  l)[2/<(<fc)  +  F>(tk+ 1 )  +  Fi(tk- 1)] 

(A.  16) 
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Case  0:  n  =  k  +  l,t  =  k 


k+l,k 


_d_ 

dtk 


m,  r 

2E{^X,’X,+1(^+  i,4)  +  ^Xl,Xi+,(4,U-i) 

■  j= i  *■ 

-  F*"Xi"(tk+utk_l)  -  /f ‘’*’+1(*fc,tfc)] 


-  (2mj  +  l)[Ft(tk+1)  -  Fi(tk)]  [Fj(ffc)  -  Fi(tk+X)}  j 

=  2^|  f*"X’+'(x,tk)dx  +  jf“_I  fXuX>+'(tk,y)dy 

-  £  fX"X’+'(x,tk)dx  -  £  fX"X’+'(tk,y)dy  j 


=  2 


-  (2 m,i  +  l)/i(t*)[2Fj(ffc)  +  Fi(tk+ 1)  + 
fX"X’+\x,tk)dx  -  jf‘ 


-  (2mj  +  l)/fc(tfc)[2fi(tfc)  +  Fj(4+i)+  ^i(4-i)] 


(A.17) 
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Appendix  B 


Back-Propagation 

Algorithm 


The  back-propagation  Is  a  training  algorithm  designed  to  minimize  the  mean  square  error 
between  the  output  of  the  perceptron  neural  network  and  the  desired  output  of  the  network 
for  a  given  Input  vector.  This  is  achieved  via  a  gradient  descent  algorithm.  One  requirement 
Is  that  the  nonlinearity  is  continuously  differentiable.  One  commonly  used  continuously 
differentiable  nonlinearity  is  the  sigmoid  /(y)  =  .  The  back-propogation  algorithm 

given  below  assumes  a  sigmoidal  nonlinearity. 

Step  1: 

Hie  weights  and  node  ofTset  values  for  all  perceptrons  in  the  network  are  initialized 
to  small  random  values. 

Step  2: 

The  Input  vector  from  the  training  data,  z  =  (zo,  21 ,  •  ■  • ,  *k-\ )T .  *s  presented  as  an 
input  to  the  perceptron  neural  network.  The  desired  output  of  the  neural  network, 
d  =  (d°,dl, . .  is  also  specified  at  this  stage. 


Step  3: 

The  actual  output  of  the  network,  o  =  is  computed  by  the 

network  In  a  feed  forward  manner. 

Step  4: 

Now  the  weights  are  adjusted.  Starting  at  the  output  nodes  and  working  down 
towards  the  the  first  layer  of  nodes,  the  weights  are  adjusted  by 

Wij(t  +  1)  =  Wi j(t)  +  +  a(t Vij(t)  -  Wij(t  -  1)). 

Wij(t)  Is  the  weight  at  time  t  from  node  i  (or  Input  i)  to  node  j.  x'  Is  the  output  of 
node  t  (or  Is  input  i.)  r]  is  a  gain  term  such  that  77  £  (0, 1).  a  Is  a  momentum  term 
such  that  a  €  (0, 1).  And.  Sj  Is  an  error  term  for  node  j.  For  an  output  node  j. 

6j  =  o>(l  -oi)(d>  -o’). 

For  an  internal  node  j. 

6j  =  x'(l  -  x')^T6kWjk 

k 

where  k  Is  over  all  nodes  in  the  layers  above  node  j.  The  node  offset  values  are 
adapted  In  a  similar  manner  by  assuming  they  are  weights  from  constant  valued 
inputs. 

Step  5: 

Return  to  Step  2  and  repeat  the  process  for  another  training  vector. 
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Appendix  C 


Probability  Density 
Functions 


The  numerical  results  of  Chapter  2  required  kn  .  ledge  of  the  marginal  and  bivariate  cdfs 
under  each  hypothesis.  This  appendix  lists  the  expressions  for  the  Rayleigh  and  lognormal 
marginal  pdfs  and  cdfs.  The  bivariate  pdfs  are  listed,  but  the  bivariate  cdfs  are  not.  Bivariate 
cdfs  were  obtained  via  a  Simpson's  integration  of  the  bivariate  pdfs. 

The  Rayleigh  marginal  pdf  is  given  by 


(C.l) 


The  constant  <r2  Is  the  variance  of  the  underlying  Gaussian  process.  The  Rayleigh  marginal 
cdf  is  obtained  by  integrating  (C.l)  and  is  given  by 


The  Rayleigh  bivariate  pdf  has  the  form 


f(z,w)  =  — 


zxv 


<7<{1  -  p2) 


exp 


(  z2  +  w2  >  !  pzw  1 

V  2(1  -p2)a2)  0  \(1  -  p2)a2  J  ’ 


(C.2) 


(C.3) 
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In  (C.3).  p  ts  the  correlation  coefficient  between  z  and  w.  and  Iq(-)  Is  the  modified  Bessel 
function  of  the  first  kind.  The  Rayleigh  bivariate  cdf  is  obtained  by  a  Simpson's  integration 
of  the  bivariate  pdf. 

The  lognormal  marginal  pdf  Is  given  by 


/(*)  = 


1  f  (log  x  p) 

—= —  exp  -  - — 2 — 

y/2x<rx  2a2 


(C.4) 


Here  again,  a  is  the  variance  of  the  underlying  Gaussian  process  and  p  is  the  mean  of 
underlying  Gaussian  process.  By  Integration,  the  expression  for  the  lognormal  marginal  cdf 
is  obtained  as 

'log!  -  /i' 


F(x) 


where  $(•)  Is  the  normal  distribution  function  defined  as 


*(I)  =  W*L'*p(:r)du 


(0.5) 


(C.6, 


The  expression  for  the  lognormal  bivariate  pdf  is  given  as 


f{w,z)  = 


2  TTy/l^ 


p2a2wz 


■  exp 


(-( 


(logm  -  p)2  +  (log  2  p)2 

2(1  -p2)a2 


2p(logu>  -  |U)(logz  -  p) 
2(1  -  p2)o2 


))• 


(C.7) 


Once  again,  p  denotes  the  correlation  coefficient  between  w  and  z.  fhe  lognormal  bivariate 
cdfs  are  obtained  via  Simpson's  Integration  of  the  bivariate  pdfs. 
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